[
https://issues.apache.org/jira/browse/STORM-404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14117347#comment-14117347
]
xiajun commented on STORM-404:
------------------------------
I found this error just now. In my situation, supervisor is alive, bug some
worker on this supervisor just died for some reason, then worker on some other
supervisor will exit because the " java.lang.RuntimeException" at
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128),
anyone have some suggestions, thanks a lot.
> Worker on one machine crashes due to a failure of another worker on another
> machine
> -----------------------------------------------------------------------------------
>
> Key: STORM-404
> URL: https://issues.apache.org/jira/browse/STORM-404
> Project: Apache Storm (Incubating)
> Issue Type: Bug
> Affects Versions: 0.9.2-incubating
> Reporter: Itai Frenkel
>
> I have two workers (one on each machine). The first worker(10.30.206.125) had
> a problem starting (could not find Nimbus host), however the second worker
> crashed too since it could not connect to the first worker.
> This looks like a cascading failure, which seems like a bug.
> 2014-07-15 17:43:32 b.s.m.n.Client [INFO] Reconnect started for
> Netty-Client-ip-10-30-206-125.ec2.internal/10.30.206.125:6700... [17]
> 2014-07-15 17:43:33 b.s.m.n.Client [INFO] Reconnect started for
> Netty-Client-ip-10-30-206-125.ec2.internal/10.30.206.125:6700... [18]
> 2014-07-15 17:43:34 b.s.m.n.Client [INFO] Reconnect started for
> Netty-Client-ip-10-30-206-125.ec2.internal/10.30.206.125:6700... [19]
> 2014-07-15 17:43:35 b.s.m.n.Client [INFO] Reconnect started for
> Netty-Client-ip-10-30-206-125.ec2.internal/10.30.206.125:6700... [20]
> 2014-07-15 17:43:36 b.s.m.n.Client [INFO] Reconnect started for
> Netty-Client-ip-10-30-206-125.ec2.internal/10.30.206.125:6700... [21]
> 2014-07-15 17:43:37 b.s.m.n.Client [INFO] Reconnect started for
> Netty-Client-ip-10-30-206-125.ec2.internal/10.30.206.125:6700... [22]
> 2014-07-15 17:43:38 b.s.m.n.Client [INFO] Reconnect started for
> Netty-Client-ip-10-30-206-125.ec2.internal/10.30.206.125:6700... [23]
> 2014-07-15 17:43:39 b.s.m.n.Client [INFO] Reconnect started for
> Netty-Client-ip-10-30-206-125.ec2.internal/10.30.206.125:6700... [24]
> 2014-07-15 17:43:40 b.s.m.n.Client [INFO] Reconnect started for
> Netty-Client-ip-10-30-206-125.ec2.internal/10.30.206.125:6700... [25]
> 2014-07-15 17:43:41 b.s.m.n.Client [INFO] Reconnect started for
> Netty-Client-ip-10-30-206-125.ec2.internal/10.30.206.125:6700... [26]
> 2014-07-15 17:43:42 b.s.m.n.Client [INFO] Reconnect started for
> Netty-Client-ip-10-30-206-125.ec2.internal/10.30.206.125:6700... [27]
> 2014-07-15 17:43:43 b.s.m.n.Client [INFO] Reconnect started for
> Netty-Client-ip-10-30-206-125.ec2.internal/10.30.206.125:6700... [28]
> 2014-07-15 17:43:44 b.s.m.n.Client [INFO] Reconnect started for
> Netty-Client-ip-10-30-206-125.ec2.internal/10.30.206.125:6700... [29]
> 2014-07-15 17:43:45 b.s.m.n.Client [INFO] Reconnect started for
> Netty-Client-ip-10-30-206-125.ec2.internal/10.30.206.125:6700... [30]
> 2014-07-15 17:43:46 b.s.m.n.Client [INFO] Closing Netty Client
> Netty-Client-ip-10-30-206-125.ec2.internal/10.30.206.125:6700
> 2014-07-15 17:43:46 b.s.m.n.Client [INFO] Waiting for pending batchs to be
> sent with Netty-Client-ip-10-30-206-125.ec2.internal/10.30.206.125:6700...,
> timeout: 600000ms, pendings: 0
> 2014-07-15 17:43:46 b.s.util [ERROR] Async loop died!
> java.lang.RuntimeException: java.lang.RuntimeException: Client is being
> closed, and does not take requests any more
> at
> backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
> at
> backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
> at
> backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
> at
> backtype.storm.disruptor$consume_loop_STAR_$fn__758.invoke(disruptor.clj:94)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
> at backtype.storm.util$async_loop$fn__457.invoke(util.clj:431)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
> at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_60]
> Caused by: java.lang.RuntimeException: Client is being closed, and does not
> take requests any more
> at backtype.storm.messaging.netty.Client.send(Client.java:194)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
> at backtype.storm.utils.TransferDrainer.send(TransferDrainer.java:54)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
> at
> backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__5927$fn__5928.invoke(worker.clj:322)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
> at
> backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__5927.invoke(worker.clj:320)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
> at
> backtype.storm.disruptor$clojure_handler$reify__745.onEvent(disruptor.clj:58)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
> at
> backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125)
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
> ... 6 common frames omitted
> 2014-07-15 17:43:46 b.s.util [INFO] Halting process: ("Async loop died!")
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)