James Xu created STORM-80:
-----------------------------
Summary: NPE caused by TridentBoltExecutor reusing TrackedBatches
between batch groups
Key: STORM-80
URL: https://issues.apache.org/jira/browse/STORM-80
Project: Apache Storm (Incubating)
Issue Type: Bug
Reporter: James Xu
https://github.com/nathanmarz/storm/issues/421
I'm seeing intermittent errors caused by SubtopologyBolt.execute being called
with a BatchInfo whose ProcessorContext is set up for a different Batch Group.
In particular I'm seeing null pointer exceptions from PartitionPersistProcessor
because its state fields were never set up correctly.
The best I can tell the id key (IBatchID) being used for the _batches map in
TridentBoltExecutor is not unique between batch groups. As a result the tracked
batch will have been initialized for a different Batch Group and set of
processors.
I hoped to be able to track down the source of this issue but can't determine
where the BatchIDs are being added to the tuples.
If it matters, my topology has two streams each reading from their own
OpaqueTransactionalKafka spout w/different topics.
Backtrace:
65108 [Thread-25] ERROR backtype.storm.daemon.executor -
java.lang.RuntimeException: java.lang.NullPointerException
at
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:87)
~[storm-0.9.0-wip4.jar:na]
at
backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:58)
~[storm-0.9.0-wip4.jar:na]
at
backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:62)
~[storm-0.9.0-wip4.jar:na]
at
backtype.storm.daemon.executor$fn__3551$fn__3563$fn__3610.invoke(executor.clj:712)
~[storm-0.9.0-wip4.jar:na]
at backtype.storm.util$async_loop$fn__436.invoke(util.clj:377)
~[storm-0.9.0-wip4.jar:na]
at clojure.lang.AFn.run(AFn.java:24) [clojure-1.4.0.jar:na]
at java.lang.Thread.run(Thread.java:722) [na:1.7.0_09]
Caused by: java.lang.NullPointerException: null
at
storm.trident.planner.processor.PartitionPersistProcessor.execute(PartitionPersistProcessor.java:59)
~[storm-0.9.0-wip4.jar:na]
at
storm.trident.planner.SubtopologyBolt$InitialReceiver.receive(SubtopologyBolt.java:189)
~[storm-0.9.0-wip4.jar:na]
at
storm.trident.planner.SubtopologyBolt.execute(SubtopologyBolt.java:129)
~[storm-0.9.0-wip4.jar:na]
at
storm.trident.topology.TridentBoltExecutor.execute(TridentBoltExecutor.java:352)
~[storm-0.9.0-wip4.jar:na]
at
backtype.storm.daemon.executor$fn__3551$tuple_action_fn__3553.invoke(executor.clj:607)
~[storm-0.9.0-wip4.jar:na]
at
backtype.storm.daemon.executor$mk_task_receiver$fn__3474.invoke(executor.clj:379)
~[storm-0.9.0-wip4.jar:na]
at
backtype.storm.disruptor$clojure_handler$reify__3011.onEvent(disruptor.clj:43)
~[storm-0.9.0-wip4.jar:na]
at
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:84)
~[storm-0.9.0-wip4.jar:na]
... 6 common frames omitted
Also, I'm only seeing this in LocalCluster mode, not in production.
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)