[
https://issues.apache.org/jira/browse/STORM-112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rick Kellogg updated STORM-112:
-------------------------------
Component/s: storm-core
> Race condition between Topology Kill and Worker Timeout can crash supervisor
> ----------------------------------------------------------------------------
>
> Key: STORM-112
> URL: https://issues.apache.org/jira/browse/STORM-112
> Project: Apache Storm
> Issue Type: Bug
> Components: storm-core
> Reporter: James Xu
>
> Recently during testing on a single node cluster we saw a supervisor crash
> when a topology was killed. The supervisor came back up and recovered, so it
> was not that big of a deal, but when we dug into it, it appears that there is
> a race.
> https://github.com/nathanmarz/storm/issues/656
> When a topology is killed the local assignments are reset, and then
> stormconf.ser is deleted right away. But at the same time sync-process may
> already be running with old state indicating that a worker timed out and
> needs to be relaunched. launch-worker then tries to read in the topology conf
> which was deleted and crashes.
> The following is a sanitized version of the supervisor log that shows this
> happening.
> https://gist.github.com/revans2/6282830
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)