GitHub user danny0405 opened a pull request:

    https://github.com/apache/storm/pull/2319

    [STORM-2693] Nimbus assignments promotion

    Storm now doesn't support large cluster[ for example thousand of 
supervisors] very well, for our production, topology submission/killing is very 
ineffective when cluster grows to be large, i checkout the assignments strategy 
now and find that actually it can be promoted.
    
    For assignment promotion:
    1. nimbus will put the assignments in local disk
    2. when restart or HA leader trigger nimbus will recover assignments from 
zk to local disk
    3. nimbus will tell supervisor its assignment every time through RPC every 
scheduling round [ only the assignments changed nodes will be notified ]
    4. expect that the nimbus notification, supervisor will sync assignments at 
fixed time[ rpc request to nimbus ]
    5. workers will sync assignments just from local supervisor [ or from 
zookeeper when local supervisor collapse]
    
    <img width="603" alt="2fa30cd8-af15-4352-992d-a67bd724e7fb" 
src="https://user-images.githubusercontent.com/7644508/30267044-d0758492-9713-11e7-87cc-09af890aced9.png";>
    
    I have tested it in our cluster, with the new assignments distribution mode 
of RPC, supervisor will response to the assignments change very fast[ 
milliseconds ] and efficiently [ only assignments changed nodes will be 
notified ], also it has the full robustness of the old zookeeper mode:
    1. when nimbus collapse, workers works fine[ like the original ], when 
leader starts up, it will sync the assignments and start to work again
    2. when supervisor goes down, workers still workers fine,[ it will sync 
connections from zk like the original ], when supervisor goes up, it will just 
sync the assignments from nimbus
    3. when zk is unstable, it will not affect the assignments sync, but only 
the heartbeats and logical plan[ StormBase or something ]
    
    This is my JIRA task: https://issues.apache.org/jira/browse/STORM-2693
    
    @HeartSaVioR can you please help me to review this?

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/danny0405/storm schedule-promotion

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/2319.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2319
    
----
commit 9e7cde8c48e33811615ab4d36e1f5dad94e8499c
Author: chenyuzhao <chenyuz...@meituan.com>
Date:   2017-09-11T03:46:23Z

    add local assignment backend and assign assignemnts through RPC

----


---

Reply via email to