[
https://issues.apache.org/jira/browse/STORM-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Joseph Evans resolved STORM-2017.
----------------------------------------
Resolution: Fixed
Fix Version/s: 1.0.3
1.1.0
2.0.0
Thanks [~kluoto],
I merged this into master, 1.x-branch and 1.0.x-branch. Keep up the good work.
> ShellBolt stops reporting task ids
> ----------------------------------
>
> Key: STORM-2017
> URL: https://issues.apache.org/jira/browse/STORM-2017
> Project: Apache Storm
> Issue Type: Bug
> Components: storm-core
> Affects Versions: 1.0.1, 1.0.3
> Reporter: Lasse Kiviluoto
> Assignee: Lasse Kiviluoto
> Fix For: 2.0.0, 1.1.0, 1.0.3
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> After running enough flow throw ShellBolt in some cases after tens of minutes
> ShellBolt stopped reporting task ids. After this error condition no new task
> ids where reported back. When acking of the tuples processed by the bolt
> where set in callback related to arrival of the task ids all tuple trees
> going through the bolt would fail after reporting stopped. ShellBolt will
> continue to operate new tuples and respond to heartbeats.
> After running some tests and making some changes to the code. I have
> following hypothesis for the reason:
> org.apache.storm.utils.ShellBoltMessageQueue has two queues one being for
> taskIds and the other for bolt messages.
> taskIds queue is implemented by LinkedList and bolt msg queue
> LinkedBlockingQueue. Both of the queues are operated similarly.
> One major difference between the structures is that LinkedList is not
> synchronized.
> In the code:
> ShellBoltMessageQueue.java:58 add method is used without holding the lock.
> Where as ShellBoltMessageQueue.java:110 uses the poll method with the lock.
> As in ShellBolt BoltReaderRunnable and BoltWriterRunnable are run
> concurrently this can lead to race condition.
> If I move the ShellBoltMessageQueue.java:58 inside the lock and run the test
> in similar fashion it seems to solve the issue.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)