[
https://issues.apache.org/jira/browse/STORM-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15352992#comment-15352992
]
Jungtaek Lim commented on STORM-1933:
-------------------------------------
If there's race condition, sync-processes recognizes disallowed worker and
removes directories for that worker but sync-supervisor can recreate worker's
heartbeat directory just after sync-processes removes worker root.
sync-processes : shutting down worker
sync-processes : RMR heartbeat directory of worker
sync-supervisor : sync supervisor called
sync-processes : RMR pids directory
sync-supervisor : write new assignment
sync-supervisor : read workers directory to obtain worker list (in
kill-existing-workers-with-change-in-components)
sync-processes : RMR root directory (late!)
sync-processes : remove worker-user
sync-processes : read worker heartbeat by creating LocalState which refers
heartbeat directory. NOTE: it creates VersionedStore which creates "directory".
In the next run of sync-processes, sync-processes will read workers directory
to obtain worker list, and since heartbeat directory is created, worker will be
recognized as "not-started".
> Intermittent test failure on test-multiple-active-storms-multiple-supervisors
> for supervisor-test
> --------------------------------------------------------------------------------------------------
>
> Key: STORM-1933
> URL: https://issues.apache.org/jira/browse/STORM-1933
> Project: Apache Storm
> Issue Type: Sub-task
> Components: storm-core
> Affects Versions: 1.0.0, 2.0.0, 1.0.1
> Reporter: Jungtaek Lim
> Assignee: Jungtaek Lim
> Attachments:
> only-thread-1362-and-1363-BUG-60850-intermittent-failure-supervisor-test.txt
>
>
> test-multiple-active-storms-multiple-supervisors is failing with fairly high
> chance. I've run unit test of 1.x branch 3 times and met this issue, and
> users report FileNotFound issue on supervisor which seems to be related to
> this.
> I have log file so I'll attach once issue is created.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)