[ 
https://issues.apache.org/jira/browse/STORM-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17126069#comment-17126069
 ] 

Ethan Li commented on STORM-3620:
---------------------------------

Not only outputCollector is not thread safe. There is a metricsTick that also 
uses ExecutorTransfer to transfer tuple.  In the case of single threaded bolt, 
all of outputcollector and metricsTick happen in one thread, which is fine. But 
if there are multiple threads in the bolt, ExecutorTransfer transfer can happen 
in different threads. There will be data corruption since the serializer is not 
thread safe.

> OutputCollector in Storm 2.x is not thread-safe
> -----------------------------------------------
>
>                 Key: STORM-3620
>                 URL: https://issues.apache.org/jira/browse/STORM-3620
>             Project: Apache Storm
>          Issue Type: Bug
>    Affects Versions: 2.0.0, 2.1.0
>            Reporter: Ethan Li
>            Assignee: Ethan Li
>            Priority: Major
>
> OutputCollector is not thread-safe in 2.x. 
> It can cause data corruption if multiple threads in the same executor calls 
> OutputCollector to emit data at the same time:
> 1. Every executor has an instance of ExecutorTransfer
> https://github.com/apache/storm/blob/00f48d60e75b28e11a887baba02dc77876b2bb3d/storm-client/src/jvm/org/apache/storm/executor/Executor.java#L146
> 2. Every ExecutorTransfer has its own serializer
> https://github.com/apache/storm/blob/00f48d60e75b28e11a887baba02dc77876b2bb3d/storm-client/src/jvm/org/apache/storm/executor/ExecutorTransfer.java#L44
> 3. Every executor has its own outputCollector
> https://github.com/apache/storm/blob/00f48d60e75b28e11a887baba02dc77876b2bb3d/storm-client/src/jvm/org/apache/storm/executor/bolt/BoltExecutor.java#L146-L147
> 4. When outputCollector is called to emit to remote workers, it uses 
> ExecutorTransfer to transfer data
> https://github.com/apache/storm/blob/00f48d60e75b28e11a887baba02dc77876b2bb3d/storm-client/src/jvm/org/apache/storm/executor/ExecutorTransfer.java#L66
> 5. which will try to serialize data
> https://github.com/apache/storm/blob/00f48d60e75b28e11a887baba02dc77876b2bb3d/storm-client/src/jvm/org/apache/storm/daemon/worker/WorkerTransfer.java#L116
> 6. But serializer is not thread-safe
> https://github.com/apache/storm/blob/00f48d60e75b28e11a887baba02dc77876b2bb3d/storm-client/src/jvm/org/apache/storm/serialization/KryoTupleSerializer.java#L33-L43
> ----
> But in the doc, http://storm.apache.org/releases/2.1.0/Concepts.html, it says 
> outputCollector is thread-safe. 
> {code:java}
> Its perfectly fine to launch new threads in bolts that do processing 
> asynchronously. OutputCollector is thread-safe and can be called at any time.
> {code}
> We should either fix it to make it thread-safe, or update the document to not 
> mislead users



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to