[
https://issues.apache.org/jira/browse/STORM-450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Joseph Evans updated STORM-450:
--------------------------------------
Summary: Netty can cause error on clean shutdown of worker (was: netty
cascading failure on clean shutdown of worker)
> Netty can cause error on clean shutdown of worker
> -------------------------------------------------
>
> Key: STORM-450
> URL: https://issues.apache.org/jira/browse/STORM-450
> Project: Apache Storm (Incubating)
> Issue Type: Bug
> Affects Versions: 0.9.2-incubating, 0.9.0.1, 0.9.3-incubating
> Reporter: Robert Joseph Evans
> Assignee: Robert Joseph Evans
>
> We recently had an issue where a worker process was shutdown cleaning on
> 0.9.0. The reason the worker shutdown cleanly is not the issue here, but it
> caused a cascading failure that made a connected worker shutdown too. This
> is going to be even more problematic in newer versions of storm when we give
> the worker time to shutdown cleanly instead of just shooting it with a kill -9
> Ideally the client should continue to try and reconnect, because the worker
> may have exited on its own and will be re-spawned shortly. If it is
> rescheduled elsewhere the worker will eventually detect it and reroute things
> accordingly. This is what happens already when the connection is just
> closed. There really is no reason to have one side know when the other side
> is shutting down.
> {code}
> 2014-08-11 19:00:17 b.s.util [ERROR] Async loop died!
> java.lang.RuntimeException: java.lang.RuntimeException: Client is being
> closed, and does not take requests any more
> at
> backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:130)
> ~[storm-core-0.9.0-wip21.jar:na]
> at
> backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:101)
> ~[storm-core-0.9.0-wip21.jar:na]
> at
> backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:62)
> ~[storm-core-0.9.0-wip21.jar:na]
> at
> backtype.storm.disruptor$consume_loop_STAR_$fn__1999.invoke(disruptor.clj:74)
> ~[storm-core-0.9.0-wip21.jar:na]
> at backtype.storm.util$async_loop$fn__421.invoke(util.clj:400)
> ~[storm-core-0.9.0-wip21.jar:na]
> at clojure.lang.AFn.run(AFn.java:24) [clojure-1.4.0.jar:na]
> at java.lang.Thread.run(Thread.java:722) [na:1.7.0_17]
> Caused by: java.lang.RuntimeException: Client is being closed, and does not
> take requests any more
> at backtype.storm.messaging.netty.Client.send(Client.java:118)
> ~[storm-netty-0.9.0-wip21.jar:na]
> at
> backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__4922$fn__4923.invoke(worker.clj:342)
> ~[storm-core-0.9.0-wip21.jar:na]
> at
> backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__4922.invoke(worker.clj:331)
> ~[storm-core-0.9.0-wip21.jar:na]
> at
> backtype.storm.disruptor$clojure_handler$reify__1986.onEvent(disruptor.clj:43)
> ~[storm-core-0.9.0-wip21.jar:na]
> at
> backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:127)
> ~[storm-core-0.9.0-wip21.jar:na]
> ... 6 common frames omitted
> 2014-08-11 19:00:17 b.s.util [INFO] Halting process: ("Async loop died!")
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)