[ 
https://issues.apache.org/jira/browse/STORM-450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated STORM-450:
--------------------------------------

    Summary: Netty can cause error on clean shutdown of worker  (was: netty 
cascading failure on clean shutdown of worker)

> Netty can cause error on clean shutdown of worker
> -------------------------------------------------
>
>                 Key: STORM-450
>                 URL: https://issues.apache.org/jira/browse/STORM-450
>             Project: Apache Storm (Incubating)
>          Issue Type: Bug
>    Affects Versions: 0.9.2-incubating, 0.9.0.1, 0.9.3-incubating
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>
> We recently had an issue where a worker process was shutdown cleaning on 
> 0.9.0.  The reason the worker shutdown cleanly is not the issue here, but it 
> caused a cascading failure that made a connected worker shutdown too.  This 
> is going to be even more problematic in newer versions of storm when we give 
> the worker time to shutdown cleanly instead of just shooting it with a kill -9
> Ideally the client should continue to try and reconnect, because the worker 
> may have exited on its own and will be re-spawned shortly.  If it is 
> rescheduled elsewhere the worker will eventually detect it and reroute things 
> accordingly.  This is what happens already when the connection is just 
> closed.  There really is no reason to have one side know when the other side 
> is shutting down.  
> {code}
> 2014-08-11 19:00:17 b.s.util [ERROR] Async loop died!
> java.lang.RuntimeException: java.lang.RuntimeException: Client is being 
> closed, and does not take requests any more
>       at 
> backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:130)
>  ~[storm-core-0.9.0-wip21.jar:na]
>       at 
> backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:101)
>  ~[storm-core-0.9.0-wip21.jar:na]
>       at 
> backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:62)
>  ~[storm-core-0.9.0-wip21.jar:na]
>       at 
> backtype.storm.disruptor$consume_loop_STAR_$fn__1999.invoke(disruptor.clj:74) 
> ~[storm-core-0.9.0-wip21.jar:na]
>       at backtype.storm.util$async_loop$fn__421.invoke(util.clj:400) 
> ~[storm-core-0.9.0-wip21.jar:na]
>       at clojure.lang.AFn.run(AFn.java:24) [clojure-1.4.0.jar:na]
>       at java.lang.Thread.run(Thread.java:722) [na:1.7.0_17]
> Caused by: java.lang.RuntimeException: Client is being closed, and does not 
> take requests any more
>       at backtype.storm.messaging.netty.Client.send(Client.java:118) 
> ~[storm-netty-0.9.0-wip21.jar:na]
>       at 
> backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__4922$fn__4923.invoke(worker.clj:342)
>  ~[storm-core-0.9.0-wip21.jar:na]
>       at 
> backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__4922.invoke(worker.clj:331)
>  ~[storm-core-0.9.0-wip21.jar:na]
>       at 
> backtype.storm.disruptor$clojure_handler$reify__1986.onEvent(disruptor.clj:43)
>  ~[storm-core-0.9.0-wip21.jar:na]
>       at 
> backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:127)
>  ~[storm-core-0.9.0-wip21.jar:na]
>       ... 6 common frames omitted
> 2014-08-11 19:00:17 b.s.util [INFO] Halting process: ("Async loop died!")
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to