Petr Janeček created STORM-3422:
-----------------------------------
Summary: TupleCaptureBolt seems to be not thread-safe
Key: STORM-3422
URL: https://issues.apache.org/jira/browse/STORM-3422
Project: Apache Storm
Issue Type: Bug
Components: storm-client
Affects Versions: 1.2.2, 2.0.0
Reporter: Petr Janeček
Marking this as Major, but the problem lies in testing code. This makes
integration testing hard, but the issue does not affect any production code.
First, let me show you a stack trace for Storm 2.0.0:
{{java.lang.RuntimeException: java.lang.NullPointerException}}
{{ at org.apache.storm.executor.Executor.accept(Executor.java:282)
~[storm-client-2.0.0.jar:2.0.0]}}
{{ at org.apache.storm.utils.JCQueue.consumeImpl(JCQueue.java:133)
~[storm-client-2.0.0.jar:2.0.0]}}
{{ at org.apache.storm.utils.JCQueue.consume(JCQueue.java:110)
~[storm-client-2.0.0.jar:2.0.0]}}
{{ at org.apache.storm.executor.bolt.BoltExecutor$1.call(BoltExecutor.java:171)
~[storm-client-2.0.0.jar:2.0.0]}}
{{ at org.apache.storm.executor.bolt.BoltExecutor$1.call(BoltExecutor.java:158)
~[storm-client-2.0.0.jar:2.0.0]}}
{{ at org.apache.storm.utils.Utils$1.run(Utils.java:388)
[storm-client-2.0.0.jar:2.0.0]}}
{{ at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]}}
{{Caused by: java.lang.NullPointerException}}
{{ at
org.apache.storm.testing.TupleCaptureBolt.execute(TupleCaptureBolt.java:45)
~[storm-client-2.0.0.jar:2.0.0]}}
{{ at
org.apache.storm.executor.bolt.BoltExecutor.tupleActionFn(BoltExecutor.java:234)
~[storm-client-2.0.0.jar:2.0.0]}}
{{ at org.apache.storm.executor.Executor.accept(Executor.java:275)
~[storm-client-2.0.0.jar:2.0.0]}}
{{ ... 6 more}}
Here's the same for Storm 1.2.2:
{{java.lang.RuntimeException: java.lang.NullPointerException}}
{{ at
org.apache.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:522)
~[storm-core-1.2.2.jar:1.2.2]}}
{{ at
org.apache.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:487)
~[storm-core-1.2.2.jar:1.2.2]}}
{{ at
org.apache.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:74)
~[storm-core-1.2.2.jar:1.2.2]}}
{{ at
org.apache.storm.daemon.executor$fn__10795$fn__10808$fn__10861.invoke(executor.clj:861)
~[storm-core-1.2.2.jar:1.2.2]}}
{{ at org.apache.storm.util$async_loop$fn__553.invoke(util.clj:484)
[storm-core-1.2.2.jar:1.2.2]}}
{{ at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]}}
{{ at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]}}
{{Caused by: java.lang.NullPointerException}}
{{ at
org.apache.storm.testing.TupleCaptureBolt.execute(TupleCaptureBolt.java:50)
~[storm-core-1.2.2.jar:1.2.2]}}
{{ at
org.apache.storm.daemon.executor$fn__10795$tuple_action_fn__10797.invoke(executor.clj:739)
~[storm-core-1.2.2.jar:1.2.2]}}
{{ at
org.apache.storm.daemon.executor$mk_task_receiver$fn__10716.invoke(executor.clj:468)
~[storm-core-1.2.2.jar:1.2.2]}}
{{ at
org.apache.storm.disruptor$clojure_handler$reify__10135.onEvent(disruptor.clj:41)
~[storm-core-1.2.2.jar:1.2.2]}}
{{ at
org.apache.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:509)
~[storm-core-1.2.2.jar:1.2.2]}}
{{ ... 6 more}}
This is a topology running as our integration test using
{{Testing.completeTopology()}}. Both the stack traces point to the same code in
the {{TupleCaptureBolt}} - its {{name}} field is not safely published (it
should be marked {{final}}), and the internal {{HashMap}} does not safely store
the data put in it. Perhaps it should be a {{ConcurrentHashMap}}?
Would you accept a PR with a more detailed analysis, or are you going to
investigate on your side?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)