GitHub user HeartSaVioR opened a pull request:
https://github.com/apache/storm/pull/1528
STORM-1934 Fix race condition between sync-supervisor and sync-processes
* sync-supervisor just downloads new topology code and writes new local
assignment
* shutting down workers and removing topology code is moved to
sync-processes
* sync-processes does all of jobs based on local assignment and allocated
workers
* remove unused / unneeded codes
Here's my test result for this patch:
* `mvn clean install` 5 times: not met supervisor intermittent failure
(STORM-1933)
* will try more times
* kill worker via `kill`, `kill -9`, `restart worker` from UI: no issue on
restarting worker
* rebalance topology to change workers (2 -> 3): to test that new
assignment has same worker port but different executors compared to assigned
workers
* worker is recognized as :disallowed, and killed & relaunched
Rebalance test in details:
- Writing new assignment
```
6701 {:storm-id "test-topology2-4-1467185073", :executors ([7 7] [5 5] [3
3] [1 1]), :resources [0.0 0.0 0.0]},
6702 {:storm-id "test-topology2-4-1467185073", :executors ([6 6] [4 4] [2
2]), :resources [0.0 0.0 0.0]}
```
- Assigned executors:
```
6701 {:storm-id "test-topology2-4-1467185073", :executors [[7 7] [5 5] [3
3] [1 1]], :resources #object[org.apache.storm.generated.WorkerResources
0x40c4d31c "WorkerResources(mem_on_heap:0.0, mem_off_heap:0.0, cpu:0.0)"]},
6702 {:storm-id "test-topology2-4-1467185073", :executors [[6 6] [4 4] [2
2]], :resources #object[org.apache.storm.generated.WorkerResources 0x4ba861f4
"WorkerResources(mem_on_heap:0.0, mem_off_heap:0.0, cpu:0.0)"]}}
```
- Allocated:
```
"2e9bea10-02b7-4e55-88e7-b194b9917a63" [:disallowed {:time-secs 1467185407,
:storm-id "test-topology2-4-1467185073", :executors [[3 3] [6 6] [-1 -1]],
:port 6703}],
"4630c4bf-9786-47ff-9f3b-6b42d9781b9d" [:disallowed {:time-secs 1467185407,
:storm-id "test-topology2-4-1467185073", :executors [[7 7] [1 1] [-1 -1] [4
4]], :port 6701}],
"b9a622d2-5e5b-4311-999c-8c8dd92da6b6" [:disallowed {:time-secs 1467185406,
:storm-id "test-topology2-4-1467185073", :executors [[2 2] [-1 -1] [5 5]],
:port 6702}]}
```
NOTE: Due to forward reference, I have to move `sync-processes` to just
before `mk-synchronize-supervisor`. Major changes are done in sync-processes so
reviewers need to compare before & after manually. Sorry about that.
Since supervisor.clj is already ported to Java in master branch, I should
have time to read ported code, and modify to be in sync.
Please review and comment while I'm working against master branch. Thanks!
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/HeartSaVioR/storm STORM-1934-1.x
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/storm/pull/1528.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1528
----
commit e5857e07838af888988691af39efbe415b9a2345
Author: Jungtaek Lim <[email protected]>
Date: 2016-06-29T07:06:20Z
STORM-1934 Fix race condition between sync-supervisor and sync-processes
* sync-supervisor just downloads new topology code and writes new local
assignment
* shutting down workers and removing topology code is moved to
sync-processes
* sync-processes does all of jobs based on local assignment and allocated
workers
* remove unused / unneeded codes
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---