Robert Joseph Evans created STORM-450:
-----------------------------------------
Summary: netty cascading failure on clean shutdown of worker
Key: STORM-450
URL: https://issues.apache.org/jira/browse/STORM-450
Project: Apache Storm (Incubating)
Issue Type: Bug
Affects Versions: 0.9.2-incubating, 0.9.0.1, 0.9.3-incubating
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
We recently had an issue where a worker process was shutdown cleaning on 0.9.0.
The reason the worker shutdown cleanly is not the issue here, but it caused a
cascading failure that made a connected worker shutdown too. This is going to
be even more problematic in newer versions of storm when we give the worker
time to shutdown cleanly instead of just shooting it with a kill -9
Ideally the client should continue to try and reconnect, because the worker may
have exited on its own and will be re-spawned shortly. If it is rescheduled
elsewhere the worker will eventually detect it and reroute things accordingly.
This is what happens already when the connection is just closed. There really
is no reason to have one side know when the other side is shutting down.
{code}
2014-08-11 19:00:17 b.s.util [ERROR] Async loop died!
java.lang.RuntimeException: java.lang.RuntimeException: Client is being closed,
and does not take requests any more
at
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:130)
~[storm-core-0.9.0-wip21.jar:na]
at
backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:101)
~[storm-core-0.9.0-wip21.jar:na]
at
backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:62)
~[storm-core-0.9.0-wip21.jar:na]
at
backtype.storm.disruptor$consume_loop_STAR_$fn__1999.invoke(disruptor.clj:74)
~[storm-core-0.9.0-wip21.jar:na]
at backtype.storm.util$async_loop$fn__421.invoke(util.clj:400)
~[storm-core-0.9.0-wip21.jar:na]
at clojure.lang.AFn.run(AFn.java:24) [clojure-1.4.0.jar:na]
at java.lang.Thread.run(Thread.java:722) [na:1.7.0_17]
Caused by: java.lang.RuntimeException: Client is being closed, and does not
take requests any more
at backtype.storm.messaging.netty.Client.send(Client.java:118)
~[storm-netty-0.9.0-wip21.jar:na]
at
backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__4922$fn__4923.invoke(worker.clj:342)
~[storm-core-0.9.0-wip21.jar:na]
at
backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__4922.invoke(worker.clj:331)
~[storm-core-0.9.0-wip21.jar:na]
at
backtype.storm.disruptor$clojure_handler$reify__1986.onEvent(disruptor.clj:43)
~[storm-core-0.9.0-wip21.jar:na]
at
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:127)
~[storm-core-0.9.0-wip21.jar:na]
... 6 common frames omitted
2014-08-11 19:00:17 b.s.util [INFO] Halting process: ("Async loop died!")
{code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)