GitHub user danny0405 opened a pull request: https://github.com/apache/storm/pull/2319
[STORM-2693] Nimbus assignments promotion Storm now doesn't support large cluster[ for example thousand of supervisors] very well, for our production, topology submission/killing is very ineffective when cluster grows to be large, i checkout the assignments strategy now and find that actually it can be promoted. For assignment promotion: 1. nimbus will put the assignments in local disk 2. when restart or HA leader trigger nimbus will recover assignments from zk to local disk 3. nimbus will tell supervisor its assignment every time through RPC every scheduling round [ only the assignments changed nodes will be notified ] 4. expect that the nimbus notification, supervisor will sync assignments at fixed time[ rpc request to nimbus ] 5. workers will sync assignments just from local supervisor [ or from zookeeper when local supervisor collapse] <img width="603" alt="2fa30cd8-af15-4352-992d-a67bd724e7fb" src="https://user-images.githubusercontent.com/7644508/30267044-d0758492-9713-11e7-87cc-09af890aced9.png"> I have tested it in our cluster, with the new assignments distribution mode of RPC, supervisor will response to the assignments change very fast[ milliseconds ] and efficiently [ only assignments changed nodes will be notified ], also it has the full robustness of the old zookeeper mode: 1. when nimbus collapse, workers works fine[ like the original ], when leader starts up, it will sync the assignments and start to work again 2. when supervisor goes down, workers still workers fine,[ it will sync connections from zk like the original ], when supervisor goes up, it will just sync the assignments from nimbus 3. when zk is unstable, it will not affect the assignments sync, but only the heartbeats and logical plan[ StormBase or something ] This is my JIRA task: https://issues.apache.org/jira/browse/STORM-2693 @HeartSaVioR can you please help me to review this? You can merge this pull request into a Git repository by running: $ git pull https://github.com/danny0405/storm schedule-promotion Alternatively you can review and apply these changes as the patch at: https://github.com/apache/storm/pull/2319.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2319 ---- commit 9e7cde8c48e33811615ab4d36e1f5dad94e8499c Author: chenyuzhao <chenyuz...@meituan.com> Date: 2017-09-11T03:46:23Z add local assignment backend and assign assignemnts through RPC ---- ---