Ethan Li created STORM-3620:
-------------------------------

             Summary: OutputCollector in Storm 2.x is not thread-safe
                 Key: STORM-3620
                 URL: https://issues.apache.org/jira/browse/STORM-3620
             Project: Apache Storm
          Issue Type: Bug
            Reporter: Ethan Li


OutputCollector is not thread-safe in 2.x. 

It can cause data corruption if multiple threads in the same executor calls 
OutputCollector to emit data at the same time:

1. Every executor has an instance of ExecutorTransfer
https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/executor/Executor.java#L146

2. Every ExecutorTransfer has its own serializer

https://github.com/apache/storm/blob/00f48d60e75b28e11a887baba02dc77876b2bb3d/storm-client/src/jvm/org/apache/storm/executor/ExecutorTransfer.java#L44

3. Every executor has its own outputCollector

https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/executor/bolt/BoltExecutor.java#L146-L147

4. When outputCollector is called to emit to remote workers, it uses 
ExecutorTransfer to transfer data

https://github.com/apache/storm/blob/00f48d60e75b28e11a887baba02dc77876b2bb3d/storm-client/src/jvm/org/apache/storm/executor/ExecutorTransfer.java#L66

5. which will try to serialize data

https://github.com/apache/storm/blob/00f48d60e75b28e11a887baba02dc77876b2bb3d/storm-client/src/jvm/org/apache/storm/daemon/worker/WorkerTransfer.java#L116

6. But serializer is not thread-safe

https://github.com/apache/storm/blob/00f48d60e75b28e11a887baba02dc77876b2bb3d/storm-client/src/jvm/org/apache/storm/serialization/KryoTupleSerializer.java#L33-L43


But in the doc, http://storm.apache.org/releases/2.1.0/Concepts.html, it says 
outputCollector is thread-safe. 
{code:java}
Its perfectly fine to launch new threads in bolts that do processing 
asynchronously. OutputCollector is thread-safe and can be called at any time.
{code}


We should either fix it to make it thread-safe, or update the document to not 
mislead users



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to