James Xu created STORM-80:
-----------------------------

             Summary: NPE caused by TridentBoltExecutor reusing TrackedBatches 
between batch groups
                 Key: STORM-80
                 URL: https://issues.apache.org/jira/browse/STORM-80
             Project: Apache Storm (Incubating)
          Issue Type: Bug
            Reporter: James Xu


https://github.com/nathanmarz/storm/issues/421

I'm seeing intermittent errors caused by SubtopologyBolt.execute being called 
with a BatchInfo whose ProcessorContext is set up for a different Batch Group. 
In particular I'm seeing null pointer exceptions from PartitionPersistProcessor 
because its state fields were never set up correctly.

The best I can tell the id key (IBatchID) being used for the _batches map in 
TridentBoltExecutor is not unique between batch groups. As a result the tracked 
batch will have been initialized for a different Batch Group and set of 
processors.

I hoped to be able to track down the source of this issue but can't determine 
where the BatchIDs are being added to the tuples.

If it matters, my topology has two streams each reading from their own 
OpaqueTransactionalKafka spout w/different topics.

Backtrace:

65108 [Thread-25] ERROR backtype.storm.daemon.executor - 
java.lang.RuntimeException: java.lang.NullPointerException
        at 
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:87)
 ~[storm-0.9.0-wip4.jar:na]
        at 
backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:58)
 ~[storm-0.9.0-wip4.jar:na]
        at 
backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:62) 
~[storm-0.9.0-wip4.jar:na]
        at 
backtype.storm.daemon.executor$fn__3551$fn__3563$fn__3610.invoke(executor.clj:712)
 ~[storm-0.9.0-wip4.jar:na]
        at backtype.storm.util$async_loop$fn__436.invoke(util.clj:377) 
~[storm-0.9.0-wip4.jar:na]
        at clojure.lang.AFn.run(AFn.java:24) [clojure-1.4.0.jar:na]
        at java.lang.Thread.run(Thread.java:722) [na:1.7.0_09]
Caused by: java.lang.NullPointerException: null
        at 
storm.trident.planner.processor.PartitionPersistProcessor.execute(PartitionPersistProcessor.java:59)
 ~[storm-0.9.0-wip4.jar:na]
        at 
storm.trident.planner.SubtopologyBolt$InitialReceiver.receive(SubtopologyBolt.java:189)
 ~[storm-0.9.0-wip4.jar:na]
        at 
storm.trident.planner.SubtopologyBolt.execute(SubtopologyBolt.java:129) 
~[storm-0.9.0-wip4.jar:na]
        at 
storm.trident.topology.TridentBoltExecutor.execute(TridentBoltExecutor.java:352)
 ~[storm-0.9.0-wip4.jar:na]
        at 
backtype.storm.daemon.executor$fn__3551$tuple_action_fn__3553.invoke(executor.clj:607)
 ~[storm-0.9.0-wip4.jar:na]
        at 
backtype.storm.daemon.executor$mk_task_receiver$fn__3474.invoke(executor.clj:379)
 ~[storm-0.9.0-wip4.jar:na]
        at 
backtype.storm.disruptor$clojure_handler$reify__3011.onEvent(disruptor.clj:43) 
~[storm-0.9.0-wip4.jar:na]
        at 
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:84)
 ~[storm-0.9.0-wip4.jar:na]
        ... 6 common frames omitted

Also, I'm only seeing this in LocalCluster mode, not in production.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to