GitHub user danny0405 opened a pull request:
https://github.com/apache/storm/pull/2319
[STORM-2693] Nimbus assignments promotion
Storm now doesn't support large cluster[ for example thousand of
supervisors] very well, for our production, topology submission/killing is very
ineffective when cluster grows to be large, i checkout the assignments strategy
now and find that actually it can be promoted.
For assignment promotion:
1. nimbus will put the assignments in local disk
2. when restart or HA leader trigger nimbus will recover assignments from
zk to local disk
3. nimbus will tell supervisor its assignment every time through RPC every
scheduling round [ only the assignments changed nodes will be notified ]
4. expect that the nimbus notification, supervisor will sync assignments at
fixed time[ rpc request to nimbus ]
5. workers will sync assignments just from local supervisor [ or from
zookeeper when local supervisor collapse]
<img width="603" alt="2fa30cd8-af15-4352-992d-a67bd724e7fb"
src="https://user-images.githubusercontent.com/7644508/30267044-d0758492-9713-11e7-87cc-09af890aced9.png">
I have tested it in our cluster, with the new assignments distribution mode
of RPC, supervisor will response to the assignments change very fast[
milliseconds ] and efficiently [ only assignments changed nodes will be
notified ], also it has the full robustness of the old zookeeper mode:
1. when nimbus collapse, workers works fine[ like the original ], when
leader starts up, it will sync the assignments and start to work again
2. when supervisor goes down, workers still workers fine,[ it will sync
connections from zk like the original ], when supervisor goes up, it will just
sync the assignments from nimbus
3. when zk is unstable, it will not affect the assignments sync, but only
the heartbeats and logical plan[ StormBase or something ]
This is my JIRA task: https://issues.apache.org/jira/browse/STORM-2693
@HeartSaVioR can you please help me to review this?
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/danny0405/storm schedule-promotion
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/storm/pull/2319.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2319
----
commit 9e7cde8c48e33811615ab4d36e1f5dad94e8499c
Author: chenyuzhao <[email protected]>
Date: 2017-09-11T03:46:23Z
add local assignment backend and assign assignemnts through RPC
----
---