[ 
https://issues.apache.org/jira/browse/STORM-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14963167#comment-14963167
 ] 

vishnu rao commented on STORM-1022:
-----------------------------------

I came across the same exception while using STORM  '0.10.0.2.3.0.0-2557' in 
hortonworks data plaform 2.3.

Before this exception hits, a runtime exception was thrown which resulted in 
supervisor restarting workers , there by leading to 
'SEVERE: RuntimeException while executing runnable 
org.apache.storm.guava.util.concurrent.Futures'. 

Here is my scenario:

(1) i have registered a metrics consumer for my topology:
conf.registerMetricsConsumer(NewRelicMetricConsumer.class,10);

(2) the topology starts off  (in the example below at 2015-10-19T06:53)
2015-10-19T06:53:17.757-0400 b.s.d.supervisor [INFO] 
6afff1fd-4741-49a8-bd38-769146e28d81 still hasn't started

(3) The metrics consumer threw a Runtime exception which results in storm 
"Async loop died" and 'Worker died'
-----------------------------------------------------------------------------------------------
015-10-19 06:54:31 b.s.util [ERROR] Async loop died!
java.lang.RuntimeException: java.lang.ClassCastException: java.util.HashMap 
cannot be cast to java.lang.Number
        at 
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128)
 ~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
        at 
backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99)
 ~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
        at 
backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) 
~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
        at 
backtype.storm.daemon.executor$fn__7014$fn__7027$fn__7078.invoke(executor.clj:808)
 ~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
        at backtype.storm.util$async_loop$fn__545.invoke(util.clj:475) 
~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
        at clojure.lang.AFn.run(AFn.java:22) [clojure-1.6.0.jar:na]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]
Caused by: java.lang.ClassCastException: java.util.HashMap cannot be cast to 
java.lang.Number
        at 
com.pocketmath.nebula.metric.storm.NewRelicMetricConsumer.handleDataPoints(NewRelicMetricConsumer.java:55)
 ~[stormjar.jar:na]
        at 
backtype.storm.metric.MetricsConsumerBolt.execute(MetricsConsumerBolt.java:55) 
~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
        at 
backtype.storm.daemon.executor$fn__7014$tuple_action_fn__7016.invoke(executor.clj:670)
 ~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
        at 
backtype.storm.daemon.executor$mk_task_receiver$fn__6937.invoke(executor.clj:426)
 ~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
        at 
backtype.storm.disruptor$clojure_handler$reify__6513.onEvent(disruptor.clj:58) 
~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
        at 
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125)
 ~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
        ... 6 common frames omitted
2015-10-19 06:54:31 b.s.d.executor [ERROR] 
java.lang.RuntimeException: java.lang.ClassCastException: java.util.HashMap 
cannot be cast to java.lang.Number
        at 
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128)
 ~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
        at 
backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99)
 ~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
        at 
backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) 
~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
        at 
backtype.storm.daemon.executor$fn__7014$fn__7027$fn__7078.invoke(executor.clj:808)
 ~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
        at backtype.storm.util$async_loop$fn__545.invoke(util.clj:475) 
~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
        at clojure.lang.AFn.run(AFn.java:22) [clojure-1.6.0.jar:na]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]
Caused by: java.lang.ClassCastException: java.util.HashMap cannot be cast to 
java.lang.Number
        at 
com.test.metric.storm.NewRelicMetricConsumer.handleDataPoints(NewRelicMetricConsumer.java:55)
 ~[stormjar.jar:na]
        at 
backtype.storm.metric.MetricsConsumerBolt.execute(MetricsConsumerBolt.java:55) 
~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
        at 
backtype.storm.daemon.executor$fn__7014$tuple_action_fn__7016.invoke(executor.clj:670)
 ~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
        at 
backtype.storm.daemon.executor$mk_task_receiver$fn__6937.invoke(executor.clj:426)
 ~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
        at 
backtype.storm.disruptor$clojure_handler$reify__6513.onEvent(disruptor.clj:58) 
~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
        at 
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125)
 ~[storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
        ... 6 common frames omitted
-----------------------------------------------------------------------------------------------

(3) the supervisor detects death and re-launches worker

2015-10-19T06:54:42.557-0400 b.s.d.supervisor [INFO] Worker Process 
6afff1fd-4741-49a8-bd38-769146e28d81 has died!
2015-10-19T06:54:42.557-0400 b.s.d.supervisor [INFO] Shutting down and clearing 
state for id 6afff1fd-4741-49a8-bd38-769146e28d81. Current supervisor time: 
1445252082. State: :timed-out, Heartbeat: {:time-secs 1445252079, :storm-id 
"ambari_test3-27-1445251979", :executors [[41 41] [33 33] [1 1] [53 53] [65 65] 
[9 9] [57 57] [-1 -1] [61 61] [13 13] [21 21] [5 5] [29 29] [45 45] [37 37] [25 
25] [49 49] [17 17]], :port 6700}
2015-10-19T06:54:42.557-0400 b.s.d.supervisor [INFO] Shutting down 
5f60d8c8-4895-4dab-ae04-709434648e26:6afff1fd-4741-49a8-bd38-769146e28d81——

2015-10-19T06:54:43.567-0400 b.s.d.supervisor [INFO] Launching worker with 
command: ..........................

-----------------------------------------------------------------------------------------------
(4) moments later as we wait for worker startup.. we get the exception:

2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process 
cf9fc058-0bda-40f7-8450-cb9037cb05c3:Oct 19, 2015 7:03:15 AM 
org.apache.storm.guava.util.concurrent.ExecutionList executeListener
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process 
cf9fc058-0bda-40f7-8450-cb9037cb05c3:SEVERE: RuntimeException while executing 
runnable org.apache.storm.guava.util.concurrent.Futures$4@3bdb38ba with 
executor 
org.apache.storm.guava.util.concurrent.MoreExecutors$SameThreadExecutorService@351cd1b1
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process 
cf9fc058-0bda-40f7-8450-cb9037cb05c3:java.lang.RuntimeException: Failed to 
connect to Netty-Client-ip-10-5-xx.xxxx/10.5.xx.xxx:6701
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process 
cf9fc058-0bda-40f7-8450-cb9037cb05c3:       at 
backtype.storm.messaging.netty.Client.connect(Client.java:300)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process 
cf9fc058-0bda-40f7-8450-cb9037cb05c3:       at 
backtype.storm.messaging.netty.Client.access$1100(Client.java:66)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process 
cf9fc058-0bda-40f7-8450-cb9037cb05c3:       at 
backtype.storm.messaging.netty.Client$2.reconnectAgain(Client.java:289)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process 
cf9fc058-0bda-40f7-8450-cb9037cb05c3:       at 
backtype.storm.messaging.netty.Client$2.onSuccess(Client.java:275)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process 
cf9fc058-0bda-40f7-8450-cb9037cb05c3:       at 
backtype.storm.messaging.netty.Client$2.onSuccess(Client.java:267)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process 
cf9fc058-0bda-40f7-8450-cb9037cb05c3:       at 
org.apache.storm.guava.util.concurrent.Futures$4.run(Futures.java:1181)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process 
cf9fc058-0bda-40f7-8450-cb9037cb05c3:       at 
org.apache.storm.guava.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process 
cf9fc058-0bda-40f7-8450-cb9037cb05c3:       at 
org.apache.storm.guava.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process 
cf9fc058-0bda-40f7-8450-cb9037cb05c3:       at 
org.apache.storm.guava.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process 
cf9fc058-0bda-40f7-8450-cb9037cb05c3:       at 
org.apache.storm.guava.util.concurrent.ListenableFutureTask.done(ListenableFutureTask.java:91)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process 
cf9fc058-0bda-40f7-8450-cb9037cb05c3:       at 
java.util.concurrent.FutureTask.finishCompletion(FutureTask.java:384)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process 
cf9fc058-0bda-40f7-8450-cb9037cb05c3:       at 
java.util.concurrent.FutureTask.set(FutureTask.java:233)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process 
cf9fc058-0bda-40f7-8450-cb9037cb05c3:       at 
java.util.concurrent.FutureTask.run(FutureTask.java:274)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process 
cf9fc058-0bda-40f7-8450-cb9037cb05c3:       at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process 
cf9fc058-0bda-40f7-8450-cb9037cb05c3:       at 
java.util.concurrent.FutureTask.run(FutureTask.java:266)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process 
cf9fc058-0bda-40f7-8450-cb9037cb05c3:       at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process 
cf9fc058-0bda-40f7-8450-cb9037cb05c3:       at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process 
cf9fc058-0bda-40f7-8450-cb9037cb05c3:       at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process 
cf9fc058-0bda-40f7-8450-cb9037cb05c3:       at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process 
cf9fc058-0bda-40f7-8450-cb9037cb05c3:       at 
java.lang.Thread.run(Thread.java:745)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process 
cf9fc058-0bda-40f7-8450-cb9037cb05c3:Caused by: java.lang.RuntimeException: 
Giving up to connect to Netty-Client-ip-10-5-xx.xxxx/10.5.xx.xxx:6701 after 32 
failed attempts
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process 
cf9fc058-0bda-40f7-8450-cb9037cb05c3:       at 
backtype.storm.messaging.netty.Client.connect(Client.java:295)
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process 
cf9fc058-0bda-40f7-8450-cb9037cb05c3:       ... 19 more
2015-10-19T07:03:15.557-0400 b.s.util [WARN] Worker Process 
cf9fc058-0bda-40f7-8450-cb9037cb05c3:
2015-10-19T07:03:41.748-0400 b.s.d.supervisor [INFO] Worker Process 
4366cfef-65fa-4385-b36a-eb4c272f4324 exited with code: 20
2015-10-19T07:03:42.611-0400 b.s.d.supervisor [INFO] Worker Process 
4366cfef-65fa-4385-b36a-eb4c272f4324 has died!
2015-10-19T07:03:42.611-0400 b.s.d.supervisor [INFO] Shutting down and clearing 
state for id 4366cfef-65fa-4385-b36a-eb4c272f4324. Current supervisor time: 
1445252622. State: :timed-out, Heartbeat: {:time-secs 1445252620, :storm-id 
"ambari_test3-27-1445251979", :executors [[47 47] [7 7] [51 51] [3 3] [39 39] 
[35 35] [43 43] [63 63] [23 23] [11 11] [31 31] [-1 -1] [55 55] [19 19] [27 27] 
[59 59] [15 15]], :port 6701}

------------------------

On disabling my buggy metrics consumer, the exception no longer appeared.

Hope this helps.

---
Vishnu Rao

> disconnectiong between workers
> ------------------------------
>
>                 Key: STORM-1022
>                 URL: https://issues.apache.org/jira/browse/STORM-1022
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-core
>            Reporter: Jackson Chung
>
> We upgraded to 0.9.5 ando ran into the following exception. The supervisors 
> did go down:
> 1 caution in our upgrade is we started a new nimbus, without any supervisors 
> attached. Then we deployed topologies (from CICD). Next we build new 
> supervisors and the supervisors will start on startup. However, in between 
> the network service is restarted (due to hostname changed during the build <- 
> chef). Just wanna throw this out in case this makes a difference.
> In other word, it could be that supervisors started, picked up work,  then 
> network restarted. 
> {code}
> SEVERE: RuntimeException while executing runnable 
> org.apache.storm.guava.util.concurrent.Futures$4@445058b with executor 
> org.apache.storm.guava.util.concurrent.MoreExecutors$SameThreadExecutorService@691bc565
> java.lang.RuntimeException: Failed to connect to 
> Netty-Client-usw2b-grunt-drone32-prod.amz.relateiq.com/10.30.103.202:6700
> at backtype.storm.messaging.netty.Client.connect(Client.java:308)
> at backtype.storm.messaging.netty.Client.access$1100(Client.java:78)
> at backtype.storm.messaging.netty.Client$2.reconnectAgain(Client.java:297)
> at backtype.storm.messaging.netty.Client$2.onSuccess(Client.java:283)
> at backtype.storm.messaging.netty.Client$2.onSuccess(Client.java:275)
> at org.apache.storm.guava.util.concurrent.Futures$4.run(Futures.java:1181)
> at 
> org.apache.storm.guava.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
> at 
> org.apache.storm.guava.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
> at 
> org.apache.storm.guava.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
> at 
> org.apache.storm.guava.util.concurrent.ListenableFutureTask.done(ListenableFutureTask.java:91)
> at java.util.concurrent.FutureTask.finishCompletion(FutureTask.java:384)
> at java.util.concurrent.FutureTask.set(FutureTask.java:233)
> at java.util.concurrent.FutureTask.run(FutureTask.java:274)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: Giving up to connect to 
> Netty-Client-usw2b-grunt-drone32-prod.amz.relateiq.com/10.30.103.202:6700 
> after 102 failed attempts
> at backtype.storm.messaging.netty.Client.connect(Client.java:303)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to