[ 
https://issues.apache.org/jira/browse/STORM-770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496360#comment-14496360
 ] 

Michael Pershyn commented on STORM-770:
---------------------------------------

I observed the same issue with almost similar environment 

- the bolt reads default streams from 2 other bolts with grouping on 2 fields
- performs heavy cpu operations and also asyncronous i/o operations 
(core.async).
- produces no tuples, but some metrics
- uses (core.async/go CSP-thread to do fails/acks of tuples, so tuples to 
fail/ack are communicated using core.async/channel, there is one go-routine 
that checks the channel and does the ack/fail). // I can smell there may exist 
some problem here with thread-safety... not sure though. It worked long time 
without issues.
- What is very important, at the time of the fail and about 30 minutes before, 
the topology was running near the capacity limit on that scale (performance 
tests). So, assume there is always data in those 2 streams. 
On a 2 times lower rate the issue is not observed.

Also "the exception happened all of a sudden, there were no previous warnings 
or indications of distress in the logs." The topology recovered in several 
minutes after the load was decreased back to "normal". No rebalance, re-deploy, 
or nodes restarts were accomplished.

I have also prepared all the logs, let me know if they also bring someting. 
Also, the part that is missing in logs above: 
{code}
// Assumed the java.lang.RuntimeException: java.lang.NullPointerException 
happened at 14:32:30.682
// supervisor.log
2015-04-15T14:33:02.360 b.s.d.supervisor [INFO] Shutting down and clearing 
state for id 9acc8f8a-4593-4798-87d2-5dfbffc5ac39. Current supervisor time: 
1429101182. State: :timed-out, Heartbeat: 
#backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1429101151, :storm-id 
"changed-topology-name-380-1428677575", :executors #{[30 30] [-1 -1]}, :port 
6709}
2015-04-15T14:33:02.364 b.s.util [INFO] Error when trying to kill 42221. 
Process is probably already dead.
{code}


> NullPointerException in consumeBatchToCursor
> --------------------------------------------
>
>                 Key: STORM-770
>                 URL: https://issues.apache.org/jira/browse/STORM-770
>             Project: Apache Storm
>          Issue Type: Bug
>    Affects Versions: 0.9.2-incubating
>            Reporter: Stas Levin
>
> We got the following exception after our topology had been up for ~2 days, 
> and I was wondering if it might be related. 
> Looks like "task" in "mk-transfer-fn" is null, making "(.add remote 
> (TaskMessage. task (.serialize serializer tuple)))" fail on NPE 
> (worker.clj:128, storm-core-0.9.2-incubating.jar)
> java.lang.RuntimeException: java.lang.NullPointerException
> at 
> backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128)
>  ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
> at 
> backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99)
>  ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
> at 
> backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80)
>  ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
> at 
> backtype.storm.disruptor$consume_loop_STAR_$fn__758.invoke(disruptor.clj:94) 
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
> at backtype.storm.util$async_loop$fn__457.invoke(util.clj:431) 
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
> at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_72]
> Caused by: java.lang.NullPointerException: null
> at clojure.lang.RT.intCast(RT.java:1087) ~[clojure-1.5.1.jar:na]
> at 
> backtype.storm.daemon.worker$mk_transfer_fn$fn__5748.invoke(worker.clj:128) 
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
> at 
> backtype.storm.daemon.executor$start_batch_transfer_GT_worker_handler_BANG$fn__5483.invoke(executor.clj:256)
>  ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
> at 
> backtype.storm.disruptor$clojure_handler$reify__745.onEvent(disruptor.clj:58) 
> ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
> at 
> backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125)
>  ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
> ... 6 common frames omitted,java.lang.RuntimeException: 
> java.lang.NullPointerException
> Any ideas?
> P.S.
> Also saw it here: 
> http://mail-archives.apache.org/mod_mbox/storm-user/201501.mbox/%3CCABcMBhCusXXU=v1e66wfuatgyh1euqnd1siog65-tp8xlwx...@mail.gmail.com%3E
> https://mail-archives.apache.org/mod_mbox/storm-user/201408.mbox/%3ccajuqm_4kxhsh2_x08ujuqr76m2c+dswp0fcijbmfcaeyqgs...@mail.gmail.com%3E
> Comment from Bobby
> http://mail-archives.apache.org/mod_mbox/storm-user/201501.mbox/%3c574363643.2791948.1420470097280.javamail.ya...@jws10027.mail.ne1.yahoo.com%3E
> {quote}
> What version of storm are you using?  Are any of the bolts shell bolts?  
> There is a known
> issue where this can happen if two shell bolts share an executor, because 
> they are multi-threaded. 
> - Bobby
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to