[
https://issues.apache.org/jira/browse/STORM-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ethan Li updated STORM-3620:
----------------------------
Description:
OutputCollector is not thread-safe in 2.x.
It can cause data corruption if multiple threads in the same executor calls
OutputCollector to emit data at the same time:
1. Every executor has an instance of ExecutorTransfer
https://github.com/apache/storm/blob/00f48d60e75b28e11a887baba02dc77876b2bb3d/storm-client/src/jvm/org/apache/storm/executor/Executor.java#L146
2. Every ExecutorTransfer has its own serializer
https://github.com/apache/storm/blob/00f48d60e75b28e11a887baba02dc77876b2bb3d/storm-client/src/jvm/org/apache/storm/executor/ExecutorTransfer.java#L44
3. Every executor has its own outputCollector
https://github.com/apache/storm/blob/00f48d60e75b28e11a887baba02dc77876b2bb3d/storm-client/src/jvm/org/apache/storm/executor/bolt/BoltExecutor.java#L146-L147
4. When outputCollector is called to emit to remote workers, it uses
ExecutorTransfer to transfer data
https://github.com/apache/storm/blob/00f48d60e75b28e11a887baba02dc77876b2bb3d/storm-client/src/jvm/org/apache/storm/executor/ExecutorTransfer.java#L66
5. which will try to serialize data
https://github.com/apache/storm/blob/00f48d60e75b28e11a887baba02dc77876b2bb3d/storm-client/src/jvm/org/apache/storm/daemon/worker/WorkerTransfer.java#L116
6. But serializer is not thread-safe
https://github.com/apache/storm/blob/00f48d60e75b28e11a887baba02dc77876b2bb3d/storm-client/src/jvm/org/apache/storm/serialization/KryoTupleSerializer.java#L33-L43
----
But in the doc, http://storm.apache.org/releases/2.1.0/Concepts.html, it says
outputCollector is thread-safe.
{code:java}
Its perfectly fine to launch new threads in bolts that do processing
asynchronously. OutputCollector is thread-safe and can be called at any time.
{code}
We should either fix it to make it thread-safe, or update the document to not
mislead users
was:
OutputCollector is not thread-safe in 2.x.
It can cause data corruption if multiple threads in the same executor calls
OutputCollector to emit data at the same time:
1. Every executor has an instance of ExecutorTransfer
https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/executor/Executor.java#L146
2. Every ExecutorTransfer has its own serializer
https://github.com/apache/storm/blob/00f48d60e75b28e11a887baba02dc77876b2bb3d/storm-client/src/jvm/org/apache/storm/executor/ExecutorTransfer.java#L44
3. Every executor has its own outputCollector
https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/executor/bolt/BoltExecutor.java#L146-L147
4. When outputCollector is called to emit to remote workers, it uses
ExecutorTransfer to transfer data
https://github.com/apache/storm/blob/00f48d60e75b28e11a887baba02dc77876b2bb3d/storm-client/src/jvm/org/apache/storm/executor/ExecutorTransfer.java#L66
5. which will try to serialize data
https://github.com/apache/storm/blob/00f48d60e75b28e11a887baba02dc77876b2bb3d/storm-client/src/jvm/org/apache/storm/daemon/worker/WorkerTransfer.java#L116
6. But serializer is not thread-safe
https://github.com/apache/storm/blob/00f48d60e75b28e11a887baba02dc77876b2bb3d/storm-client/src/jvm/org/apache/storm/serialization/KryoTupleSerializer.java#L33-L43
----
But in the doc, http://storm.apache.org/releases/2.1.0/Concepts.html, it says
outputCollector is thread-safe.
{code:java}
Its perfectly fine to launch new threads in bolts that do processing
asynchronously. OutputCollector is thread-safe and can be called at any time.
{code}
We should either fix it to make it thread-safe, or update the document to not
mislead users
> OutputCollector in Storm 2.x is not thread-safe
> -----------------------------------------------
>
> Key: STORM-3620
> URL: https://issues.apache.org/jira/browse/STORM-3620
> Project: Apache Storm
> Issue Type: Bug
> Reporter: Ethan Li
> Priority: Major
>
> OutputCollector is not thread-safe in 2.x.
> It can cause data corruption if multiple threads in the same executor calls
> OutputCollector to emit data at the same time:
> 1. Every executor has an instance of ExecutorTransfer
> https://github.com/apache/storm/blob/00f48d60e75b28e11a887baba02dc77876b2bb3d/storm-client/src/jvm/org/apache/storm/executor/Executor.java#L146
> 2. Every ExecutorTransfer has its own serializer
> https://github.com/apache/storm/blob/00f48d60e75b28e11a887baba02dc77876b2bb3d/storm-client/src/jvm/org/apache/storm/executor/ExecutorTransfer.java#L44
> 3. Every executor has its own outputCollector
> https://github.com/apache/storm/blob/00f48d60e75b28e11a887baba02dc77876b2bb3d/storm-client/src/jvm/org/apache/storm/executor/bolt/BoltExecutor.java#L146-L147
> 4. When outputCollector is called to emit to remote workers, it uses
> ExecutorTransfer to transfer data
> https://github.com/apache/storm/blob/00f48d60e75b28e11a887baba02dc77876b2bb3d/storm-client/src/jvm/org/apache/storm/executor/ExecutorTransfer.java#L66
> 5. which will try to serialize data
> https://github.com/apache/storm/blob/00f48d60e75b28e11a887baba02dc77876b2bb3d/storm-client/src/jvm/org/apache/storm/daemon/worker/WorkerTransfer.java#L116
> 6. But serializer is not thread-safe
> https://github.com/apache/storm/blob/00f48d60e75b28e11a887baba02dc77876b2bb3d/storm-client/src/jvm/org/apache/storm/serialization/KryoTupleSerializer.java#L33-L43
> ----
> But in the doc, http://storm.apache.org/releases/2.1.0/Concepts.html, it says
> outputCollector is thread-safe.
> {code:java}
> Its perfectly fine to launch new threads in bolts that do processing
> asynchronously. OutputCollector is thread-safe and can be called at any time.
> {code}
> We should either fix it to make it thread-safe, or update the document to not
> mislead users
--
This message was sent by Atlassian Jira
(v8.3.4#803005)