GitHub user HeartSaVioR opened a pull request:

    https://github.com/apache/storm/pull/1528

    STORM-1934 Fix race condition between sync-supervisor and sync-processes

    * sync-supervisor just downloads new topology code and writes new local 
assignment
      * shutting down workers and removing topology code is moved to 
sync-processes
    * sync-processes does all of jobs based on local assignment and allocated 
workers
    * remove unused / unneeded codes
    
    Here's my test result for this patch:
    
    * `mvn clean install` 5 times: not met supervisor intermittent failure 
(STORM-1933)
      * will try more times
    * kill worker via `kill`, `kill -9`, `restart worker` from UI: no issue on 
restarting worker
    * rebalance topology to change workers (2 -> 3): to test that new 
assignment has same worker port but different executors compared to assigned 
workers
      * worker is recognized as :disallowed, and killed & relaunched
    
    Rebalance test in details:
    
    - Writing new assignment
    ```
    6701 {:storm-id "test-topology2-4-1467185073", :executors ([7 7] [5 5] [3 
3] [1 1]), :resources [0.0 0.0 0.0]}, 
    6702 {:storm-id "test-topology2-4-1467185073", :executors ([6 6] [4 4] [2 
2]), :resources [0.0 0.0 0.0]}
    ```
    
    - Assigned executors:
    ```
    6701 {:storm-id "test-topology2-4-1467185073", :executors [[7 7] [5 5] [3 
3] [1 1]], :resources #object[org.apache.storm.generated.WorkerResources 
0x40c4d31c "WorkerResources(mem_on_heap:0.0, mem_off_heap:0.0, cpu:0.0)"]}, 
    6702 {:storm-id "test-topology2-4-1467185073", :executors [[6 6] [4 4] [2 
2]], :resources #object[org.apache.storm.generated.WorkerResources 0x4ba861f4 
"WorkerResources(mem_on_heap:0.0, mem_off_heap:0.0, cpu:0.0)"]}}
    ```
    
    - Allocated:
    ```
    "2e9bea10-02b7-4e55-88e7-b194b9917a63" [:disallowed {:time-secs 1467185407, 
:storm-id "test-topology2-4-1467185073", :executors [[3 3] [6 6] [-1 -1]], 
:port 6703}], 
    "4630c4bf-9786-47ff-9f3b-6b42d9781b9d" [:disallowed {:time-secs 1467185407, 
:storm-id "test-topology2-4-1467185073", :executors [[7 7] [1 1] [-1 -1] [4 
4]], :port 6701}], 
    "b9a622d2-5e5b-4311-999c-8c8dd92da6b6" [:disallowed {:time-secs 1467185406, 
:storm-id "test-topology2-4-1467185073", :executors [[2 2] [-1 -1] [5 5]], 
:port 6702}]}
    ```
    
    NOTE: Due to forward reference, I have to move `sync-processes` to just 
before `mk-synchronize-supervisor`. Major changes are done in sync-processes so 
reviewers need to compare before & after manually. Sorry about that.
    
    Since supervisor.clj is already ported to Java in master branch, I should 
have time to read ported code, and modify to be in sync.
    
    Please review and comment while I'm working against master branch. Thanks!

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HeartSaVioR/storm STORM-1934-1.x

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/1528.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1528
    
----
commit e5857e07838af888988691af39efbe415b9a2345
Author: Jungtaek Lim <kabh...@gmail.com>
Date:   2016-06-29T07:06:20Z

    STORM-1934 Fix race condition between sync-supervisor and sync-processes
    
    * sync-supervisor just downloads new topology code and writes new local 
assignment
      * shutting down workers and removing topology code is moved to 
sync-processes
    * sync-processes does all of jobs based on local assignment and allocated 
workers
    * remove unused / unneeded codes

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to