[
https://issues.apache.org/jira/browse/STORM-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14963167#comment-14963167
]
vishnu rao commented on STORM-1022:
-----------------------------------
I came across the same exception while using STORM '0.10.0.2.3.0.0-2557' in
hortonworks data plaform 2.3.
Before this exception hits, a runtime exception was thrown which resulted in
supervisor restarting workers , there by leading to
'SEVERE: RuntimeException while executing runnable
org.apache.storm.guava.util.concurrent.Futures'.
Here is my scenario:
(1) i have registered a metrics consumer for my topology:
conf.registerMetricsConsumer(NewRelicMetricConsumer.class,10);
(2) the topology starts off (in the example below at 2015-10-19T06:53)
2015-10-19T06:53:17.757-0400 b.s.d.supervisor [INFO]
6afff1fd-4741-49a8-bd38-769146e28d81 still hasn't started
(3) The metrics consumer threw a Runtime exception which results in storm
"Async loop died" and 'Worker died'
-----------------------------------------------------------------------------------------------
015-10-19 06:54:31 b.s.util [ERROR] Async loop died!
java.lang.RuntimeException: java.lang.ClassCastException: java.util.HashMap
cannot be cast to java.lang.Number
at
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128)
~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
at
backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99)
~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
at
backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80)
~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
at
backtype.storm.daemon.executor$fn__7014$fn__7027$fn__7078.invoke(executor.clj:808)
~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
at backtype.storm.util$async_loop$fn__545.invoke(util.clj:475)
~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
at clojure.lang.AFn.run(AFn.java:22) [clojure-1.6.0.jar:na]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]
Caused by: java.lang.ClassCastException: java.util.HashMap cannot be cast to
java.lang.Number
at
com.pocketmath.nebula.metric.storm.NewRelicMetricConsumer.handleDataPoints(NewRelicMetricConsumer.java:55)
~[stormjar.jar:na]
at
backtype.storm.metric.MetricsConsumerBolt.execute(MetricsConsumerBolt.java:55)
~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
at
backtype.storm.daemon.executor$fn__7014$tuple_action_fn__7016.invoke(executor.clj:670)
~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
at
backtype.storm.daemon.executor$mk_task_receiver$fn__6937.invoke(executor.clj:426)
~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
at
backtype.storm.disruptor$clojure_handler$reify__6513.onEvent(disruptor.clj:58)
~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
at
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125)
~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
... 6 common frames omitted
2015-10-19 06:54:31 b.s.d.executor [ERROR]
java.lang.RuntimeException: java.lang.ClassCastException: java.util.HashMap
cannot be cast to java.lang.Number
at
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128)
~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
at
backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99)
~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
at
backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80)
~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
at
backtype.storm.daemon.executor$fn__7014$fn__7027$fn__7078.invoke(executor.clj:808)
~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
at backtype.storm.util$async_loop$fn__545.invoke(util.clj:475)
~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
at clojure.lang.AFn.run(AFn.java:22) [clojure-1.6.0.jar:na]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]
Caused by: java.lang.ClassCastException: java.util.HashMap cannot be cast to
java.lang.Number
at
com.test.metric.storm.NewRelicMetricConsumer.handleDataPoints(NewRelicMetricConsumer.java:55)
~[stormjar.jar:na]
at
backtype.storm.metric.MetricsConsumerBolt.execute(MetricsConsumerBolt.java:55)
~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
at
backtype.storm.daemon.executor$fn__7014$tuple_action_fn__7016.invoke(executor.clj:670)
~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
at
backtype.storm.daemon.executor$mk_task_receiver$fn__6937.invoke(executor.clj:426)
~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
at
backtype.storm.disruptor$clojure_handler$reify__6513.onEvent(disruptor.clj:58)
~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
at
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125)
~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
... 6 common frames omitted
-----------------------------------------------------------------------------------------------
(3) the supervisor detects death and re-launches worker
2015-10-19T06:54:42.557-0400 b.s.d.supervisor [INFO] Worker Process
6afff1fd-4741-49a8-bd38-769146e28d81 has died!
2015-10-19T06:54:42.557-0400 b.s.d.supervisor [INFO] Shutting down and clearing
state for id 6afff1fd-4741-49a8-bd38-769146e28d81. Current supervisor time:
1445252082. State: :timed-out, Heartbeat: {:time-secs 1445252079, :storm-id
"ambari_test3-27-1445251979", :executors [[41 41] [33 33] [1 1] [53 53] [65 65]
[9 9] [57 57] [-1 -1] [61 61] [13 13] [21 21] [5 5] [29 29] [45 45] [37 37] [25
25] [49 49] [17 17]], :port 6700}
2015-10-19T06:54:42.557-0400 b.s.d.supervisor [INFO] Shutting down
5f60d8c8-4895-4dab-ae04-709434648e26:6afff1fd-4741-49a8-bd38-769146e28d81——
2015-10-19T06:54:43.567-0400 b.s.d.supervisor [INFO] Launching worker with
command: ..........................
-----------------------------------------------------------------------------------------------
(4) moments later as we wait for worker startup.. we get the exception:
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process
cf9fc058-0bda-40f7-8450-cb9037cb05c3:Oct 19, 2015 7:03:15 AM
org.apache.storm.guava.util.concurrent.ExecutionList executeListener
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process
cf9fc058-0bda-40f7-8450-cb9037cb05c3:SEVERE: RuntimeException while executing
runnable org.apache.storm.guava.util.concurrent.Futures$4@3bdb38ba with
executor
org.apache.storm.guava.util.concurrent.MoreExecutors$SameThreadExecutorService@351cd1b1
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process
cf9fc058-0bda-40f7-8450-cb9037cb05c3:java.lang.RuntimeException: Failed to
connect to Netty-Client-ip-10-5-xx.xxxx/10.5.xx.xxx:6701
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process
cf9fc058-0bda-40f7-8450-cb9037cb05c3: at
backtype.storm.messaging.netty.Client.connect(Client.java:300)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process
cf9fc058-0bda-40f7-8450-cb9037cb05c3: at
backtype.storm.messaging.netty.Client.access$1100(Client.java:66)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process
cf9fc058-0bda-40f7-8450-cb9037cb05c3: at
backtype.storm.messaging.netty.Client$2.reconnectAgain(Client.java:289)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process
cf9fc058-0bda-40f7-8450-cb9037cb05c3: at
backtype.storm.messaging.netty.Client$2.onSuccess(Client.java:275)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process
cf9fc058-0bda-40f7-8450-cb9037cb05c3: at
backtype.storm.messaging.netty.Client$2.onSuccess(Client.java:267)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process
cf9fc058-0bda-40f7-8450-cb9037cb05c3: at
org.apache.storm.guava.util.concurrent.Futures$4.run(Futures.java:1181)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process
cf9fc058-0bda-40f7-8450-cb9037cb05c3: at
org.apache.storm.guava.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process
cf9fc058-0bda-40f7-8450-cb9037cb05c3: at
org.apache.storm.guava.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process
cf9fc058-0bda-40f7-8450-cb9037cb05c3: at
org.apache.storm.guava.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process
cf9fc058-0bda-40f7-8450-cb9037cb05c3: at
org.apache.storm.guava.util.concurrent.ListenableFutureTask.done(ListenableFutureTask.java:91)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process
cf9fc058-0bda-40f7-8450-cb9037cb05c3: at
java.util.concurrent.FutureTask.finishCompletion(FutureTask.java:384)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process
cf9fc058-0bda-40f7-8450-cb9037cb05c3: at
java.util.concurrent.FutureTask.set(FutureTask.java:233)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process
cf9fc058-0bda-40f7-8450-cb9037cb05c3: at
java.util.concurrent.FutureTask.run(FutureTask.java:274)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process
cf9fc058-0bda-40f7-8450-cb9037cb05c3: at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process
cf9fc058-0bda-40f7-8450-cb9037cb05c3: at
java.util.concurrent.FutureTask.run(FutureTask.java:266)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process
cf9fc058-0bda-40f7-8450-cb9037cb05c3: at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process
cf9fc058-0bda-40f7-8450-cb9037cb05c3: at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process
cf9fc058-0bda-40f7-8450-cb9037cb05c3: at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process
cf9fc058-0bda-40f7-8450-cb9037cb05c3: at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process
cf9fc058-0bda-40f7-8450-cb9037cb05c3: at
java.lang.Thread.run(Thread.java:745)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process
cf9fc058-0bda-40f7-8450-cb9037cb05c3:Caused by: java.lang.RuntimeException:
Giving up to connect to Netty-Client-ip-10-5-xx.xxxx/10.5.xx.xxx:6701 after 32
failed attempts
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process
cf9fc058-0bda-40f7-8450-cb9037cb05c3: at
backtype.storm.messaging.netty.Client.connect(Client.java:295)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process
cf9fc058-0bda-40f7-8450-cb9037cb05c3: ... 19 more
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process
cf9fc058-0bda-40f7-8450-cb9037cb05c3:
2015-10-19T07:03:41.748-0400 b.s.d.supervisor [INFO] Worker Process
4366cfef-65fa-4385-b36a-eb4c272f4324 exited with code: 20
2015-10-19T07:03:42.611-0400 b.s.d.supervisor [INFO] Worker Process
4366cfef-65fa-4385-b36a-eb4c272f4324 has died!
2015-10-19T07:03:42.611-0400 b.s.d.supervisor [INFO] Shutting down and clearing
state for id 4366cfef-65fa-4385-b36a-eb4c272f4324. Current supervisor time:
1445252622. State: :timed-out, Heartbeat: {:time-secs 1445252620, :storm-id
"ambari_test3-27-1445251979", :executors [[47 47] [7 7] [51 51] [3 3] [39 39]
[35 35] [43 43] [63 63] [23 23] [11 11] [31 31] [-1 -1] [55 55] [19 19] [27 27]
[59 59] [15 15]], :port 6701}
------------------------
On disabling my buggy metrics consumer, the exception no longer appeared.
Hope this helps.
---
Vishnu Rao
> disconnectiong between workers
> ------------------------------
>
> Key: STORM-1022
> URL: https://issues.apache.org/jira/browse/STORM-1022
> Project: Apache Storm
> Issue Type: Bug
> Components: storm-core
> Reporter: Jackson Chung
>
> We upgraded to 0.9.5 ando ran into the following exception. The supervisors
> did go down:
> 1 caution in our upgrade is we started a new nimbus, without any supervisors
> attached. Then we deployed topologies (from CICD). Next we build new
> supervisors and the supervisors will start on startup. However, in between
> the network service is restarted (due to hostname changed during the build <-
> chef). Just wanna throw this out in case this makes a difference.
> In other word, it could be that supervisors started, picked up work, then
> network restarted.
> {code}
> SEVERE: RuntimeException while executing runnable
> org.apache.storm.guava.util.concurrent.Futures$4@445058b with executor
> org.apache.storm.guava.util.concurrent.MoreExecutors$SameThreadExecutorService@691bc565
> java.lang.RuntimeException: Failed to connect to
> Netty-Client-usw2b-grunt-drone32-prod.amz.relateiq.com/10.30.103.202:6700
> at backtype.storm.messaging.netty.Client.connect(Client.java:308)
> at backtype.storm.messaging.netty.Client.access$1100(Client.java:78)
> at backtype.storm.messaging.netty.Client$2.reconnectAgain(Client.java:297)
> at backtype.storm.messaging.netty.Client$2.onSuccess(Client.java:283)
> at backtype.storm.messaging.netty.Client$2.onSuccess(Client.java:275)
> at org.apache.storm.guava.util.concurrent.Futures$4.run(Futures.java:1181)
> at
> org.apache.storm.guava.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
> at
> org.apache.storm.guava.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
> at
> org.apache.storm.guava.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
> at
> org.apache.storm.guava.util.concurrent.ListenableFutureTask.done(ListenableFutureTask.java:91)
> at java.util.concurrent.FutureTask.finishCompletion(FutureTask.java:384)
> at java.util.concurrent.FutureTask.set(FutureTask.java:233)
> at java.util.concurrent.FutureTask.run(FutureTask.java:274)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: Giving up to connect to
> Netty-Client-usw2b-grunt-drone32-prod.amz.relateiq.com/10.30.103.202:6700
> after 102 failed attempts
> at backtype.storm.messaging.netty.Client.connect(Client.java:303)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)