GitHub user HeartSaVioR opened a pull request: https://github.com/apache/storm/pull/1528
STORM-1934 Fix race condition between sync-supervisor and sync-processes * sync-supervisor just downloads new topology code and writes new local assignment * shutting down workers and removing topology code is moved to sync-processes * sync-processes does all of jobs based on local assignment and allocated workers * remove unused / unneeded codes Here's my test result for this patch: * `mvn clean install` 5 times: not met supervisor intermittent failure (STORM-1933) * will try more times * kill worker via `kill`, `kill -9`, `restart worker` from UI: no issue on restarting worker * rebalance topology to change workers (2 -> 3): to test that new assignment has same worker port but different executors compared to assigned workers * worker is recognized as :disallowed, and killed & relaunched Rebalance test in details: - Writing new assignment ``` 6701 {:storm-id "test-topology2-4-1467185073", :executors ([7 7] [5 5] [3 3] [1 1]), :resources [0.0 0.0 0.0]}, 6702 {:storm-id "test-topology2-4-1467185073", :executors ([6 6] [4 4] [2 2]), :resources [0.0 0.0 0.0]} ``` - Assigned executors: ``` 6701 {:storm-id "test-topology2-4-1467185073", :executors [[7 7] [5 5] [3 3] [1 1]], :resources #object[org.apache.storm.generated.WorkerResources 0x40c4d31c "WorkerResources(mem_on_heap:0.0, mem_off_heap:0.0, cpu:0.0)"]}, 6702 {:storm-id "test-topology2-4-1467185073", :executors [[6 6] [4 4] [2 2]], :resources #object[org.apache.storm.generated.WorkerResources 0x4ba861f4 "WorkerResources(mem_on_heap:0.0, mem_off_heap:0.0, cpu:0.0)"]}} ``` - Allocated: ``` "2e9bea10-02b7-4e55-88e7-b194b9917a63" [:disallowed {:time-secs 1467185407, :storm-id "test-topology2-4-1467185073", :executors [[3 3] [6 6] [-1 -1]], :port 6703}], "4630c4bf-9786-47ff-9f3b-6b42d9781b9d" [:disallowed {:time-secs 1467185407, :storm-id "test-topology2-4-1467185073", :executors [[7 7] [1 1] [-1 -1] [4 4]], :port 6701}], "b9a622d2-5e5b-4311-999c-8c8dd92da6b6" [:disallowed {:time-secs 1467185406, :storm-id "test-topology2-4-1467185073", :executors [[2 2] [-1 -1] [5 5]], :port 6702}]} ``` NOTE: Due to forward reference, I have to move `sync-processes` to just before `mk-synchronize-supervisor`. Major changes are done in sync-processes so reviewers need to compare before & after manually. Sorry about that. Since supervisor.clj is already ported to Java in master branch, I should have time to read ported code, and modify to be in sync. Please review and comment while I'm working against master branch. Thanks! You can merge this pull request into a Git repository by running: $ git pull https://github.com/HeartSaVioR/storm STORM-1934-1.x Alternatively you can review and apply these changes as the patch at: https://github.com/apache/storm/pull/1528.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1528 ---- commit e5857e07838af888988691af39efbe415b9a2345 Author: Jungtaek Lim <kabh...@gmail.com> Date: 2016-06-29T07:06:20Z STORM-1934 Fix race condition between sync-supervisor and sync-processes * sync-supervisor just downloads new topology code and writes new local assignment * shutting down workers and removing topology code is moved to sync-processes * sync-processes does all of jobs based on local assignment and allocated workers * remove unused / unneeded codes ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---