GitHub user danny0405 opened a pull request: https://github.com/apache/storm/pull/2389
Storm heartbeats promotion Storm now doesn't support large cluster[ for example thousand of supervisors] very well, for our production, topology submission/killing is very ineffective when cluster grows to be large, i checkout the heartbeats strategy and find that actually it can be promoted. For heartbeats promotion: 1. Nimbus will not collect heartbeats info from zookeeper every scheduling round any more, instead, it will use directly an updated cache which is updated when a heartbeat reports 2. Report heartbeats through supervisor RPC [supervisor will collect local heartbeats from local workers reported state which is in local state store] 3. Separate metrics data and heartbeats, which means that the new heartbeat will not contains metrics info now, so it is very lightweight and efficient 4. Metrics data will still be reported to zookeeper, we only use it for collecting UI stats info [in the old mode, UI stats is got from heartbeats cache, the new mode will fetch it from zookeeper directly]  With this new heartbeats mode, heartbeats will be reported very efficiently, for our production, we have about 30 workers per node/supervisor, so i mock the data and did a pressure test for nimbus heartbeats response time:  We can see that for a 1 second heartbeat report frequency, nimbus will support at least 2000 nodes, for our production, we set the worker heartbeats reporting interval to 5 seconds, so it means that we can have a 5* 2000 nodes cluster for just one cluster Because we do not need to collect all heartbeats data and compute alive/free slots for every scheduling round[ use a computed cache directly], we schedule topologies very efficiently[ only 2 minutes for 5000 topologies] About robustness: 1. when nimbus collapse, workers works fine[ like the original ], when leader starts up, it will wait for a complete heartbeats for all node and start to work again, i also make the strategy pluggable, user can override the default one 2. when supervisor goes down, workers still workers fine,[ it will report heartbeat directly to nimbus through RPC], when supervisor goes up, it will just collect the heartbeats and reports to nimbus 3. when zk is unstable, it will not affect the heartbeats[which will cause workers all collapse for old mode] any more This is my JIRA task: https://issues.apache.org/jira/browse/STORM-2693 This is the assignments promotion PR: https://github.com/apache/storm/pull/2319 You can merge this pull request into a Git repository by running: $ git pull https://github.com/danny0405/storm heartbeats-promotion Alternatively you can review and apply these changes as the patch at: https://github.com/apache/storm/pull/2389.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2389 ---- commit 9e06883b9c253ae71bab052fcbc7753f838d61b3 Author: Jungtaek Lim <kabh...@gmail.com> Date: 2017-02-08T05:03:09Z STORM-2320: CHANGELOG commit b621da98562db58062ec87a46238c0f0ccafb96e Author: Aaron Dossett <aaron.doss...@target.com> Date: 2016-03-21T18:06:44Z STORM-1464: storm-hdfs support for multiple output files and partitioning commit 657dd8815b5e91d1163e302d6be96510715a4fd7 Author: P. Taylor Goetz <ptgo...@gmail.com> Date: 2017-02-08T20:07:37Z add STORM-1464 to changelog commit e372489c0fea259a5b2de4d42bc665593326ed8e Author: P. Taylor Goetz <ptgo...@gmail.com> Date: 2017-02-08T20:14:43Z Merge branch 'cyz-dev' of github.com:danny0405/storm into 1.x-branch commit 2a7e6dc0543c8069efed21b7ced9472eb46b5237 Author: P. Taylor Goetz <ptgo...@gmail.com> Date: 2017-02-08T20:15:33Z add STORM-2270 to changelog commit bb5c6b84876da10d842889bb1729ccfab02af7b5 Author: Tibor Kiss <tibor.k...@gmail.com> Date: 2017-02-07T05:11:32Z STORM-2350: Storm-HDFS's listFilesByModificationTime is broken commit c417c8ee28384b392ddcbd96366047037164280d Author: P. Taylor Goetz <ptgo...@gmail.com> Date: 2017-02-08T21:28:06Z add STORM-2350 to changelog commit 1e40b02655072270c829057c14a3570a3d6005b9 Author: Jungtaek Lim <kabh...@gmail.com> Date: 2017-02-08T04:59:39Z Convert NoNodeException to KeyNotFoundException in getNimbodesWithLatestSequenceNumberOfBlob * since callers are able to handle KeyNotFoundException but not NoNodeException commit f91166cf6ccd721201eac114923879fa2c9a4ba6 Author: Jungtaek Lim <kabh...@gmail.com> Date: 2017-02-09T06:09:35Z Merge branch 'fix-nonodeexception-getNimbodesWithLatestSequenceNumberOfBlob-1.x' into 1.x-branch commit 8b49350cca7bb113a3cdb308cf94f3bbf6a08946 Author: ambud <asharma52...@gmail.com> Date: 2017-02-04T21:32:16Z STORM-2344 Adding Flux File Viewer to Nimbus UI Adding apache license and link to Storm Homepage Adding links from storm nimbus homepage Adding License for Javascript libraries. Using min js for esprima Adding license files commit ea1c50e2cc68187883abb7222efcaefd7420e947 Author: Jungtaek Lim <kabh...@gmail.com> Date: 2017-02-10T03:40:46Z Merge branch 'STORM-2344-1.x-merge' into 1.x-branch commit 2128fc34a8a217c9a1b55edec666f25a2646bee6 Author: Jungtaek Lim <kabh...@gmail.com> Date: 2017-02-10T03:41:06Z STORM-2344: CHANGELOG commit f5a1cf0b25be68a2b188f888a419bb14d270e2bc Author: mingmxu <ming...@ebay.com> Date: 2017-02-03T20:03:37Z STORM-2340 fix AutoCommitMode issue in KafkaSpout * Closes #1919 * fix: KafkaSpout is blocked in AutoCommitMode * add comments for impacts of AutoCommitMode * add doc about how to use KafkaSpout with at-most-once. * remove at-most-once for better describe the changes; emit null msgId when AutoCommitMode; * update sample code in storm-kafka-client to use inline setProp() commit f90d17c9715b6329938f2bd41442da5250a76bdc Author: Jungtaek Lim <kabh...@gmail.com> Date: 2017-02-14T02:53:49Z Merge branch 'STORM-2340-1.x-merge' into 1.x-branch commit 191a806de71d3e7526206b7cb6be7fad8f7da0bd Author: Jungtaek Lim <kabh...@gmail.com> Date: 2017-02-14T02:55:07Z STORM-2340: CHANGELOG commit a03137ed70a3edf155fc2c06355e12f2d4fb38f6 Author: Stig Rohde Døssing <s...@it-minds.dk> Date: 2017-02-14T20:31:45Z STORM-2250: Kafka spout refactoring to increase modularity and testability. Also support nanoseconds in Storm time simulation commit d14c2935effa914ede12e0e038ebb5b732a1ef62 Author: P. Taylor Goetz <ptgo...@gmail.com> Date: 2017-02-15T21:31:25Z Merge branch 'STORM-2250-1.x' of github.com:srdo/storm into 1.x-branch commit 8b69d43828532646d3e87d95daa250a05fc8a0be Author: P. Taylor Goetz <ptgo...@gmail.com> Date: 2017-02-15T21:32:27Z add STORM-2250 to changelog commit 17a2017fb644e353fb2a0f5bf50d400ee28036ba Author: P. Taylor Goetz <ptgo...@gmail.com> Date: 2017-02-16T18:57:43Z [maven-release-plugin] prepare release v1.1.0 commit ff80b098b5e2110d326d041b73014f5e9fbff395 Author: P. Taylor Goetz <ptgo...@gmail.com> Date: 2017-02-16T19:01:11Z [maven-release-plugin] prepare for next development iteration commit 2f69242d0b3557feb5dc710b9dcb302abbd72aae Author: Arun Mahadevan <ar...@apache.org> Date: 2017-02-15T18:20:49Z STORM-2365: Support for specifying output stream in event hubs spout commit bdb557dd1c40d4a90d036ff5063df2c51ec90863 Author: Satish Duggana <sdugg...@hortonworks.com> Date: 2017-02-17T10:38:22Z Added STORM-2365 to CHANGELOG.md commit ee1309d2a9b8cdbe4f5266327d7c62c4f9222781 Author: Jungtaek Lim <kabh...@gmail.com> Date: 2017-02-20T00:52:04Z Fix RAT issue from newly added js files commit 593d523f874b70ceddcf67fe5dd4fa9af6c8436b Author: Julien Nioche <jul...@digitalpebble.com> Date: 2017-02-20T17:32:06Z STORM-2326 Upgrade log4j and slf4j commit ebed1c8b01397b09f4083e66f574a25f9b7c585d Author: Kyle Nusbaum <knusb...@yahoo-inc.com> Date: 2017-02-21T20:18:31Z Fixing pacemaker delete-path bug. commit 187d08bf45bf424f3963a604d72e076b00d594c7 Author: Sachin Pasalkar <sachin_pasal...@symantec.com> Date: 2017-02-14T10:24:23Z STORM-1363: TridentKafkaState should handle null values from TridentTupleToKafkaMapper.getMessageFromTuple() Incase null value comes from the mapper it will print warning messages also added the time take to emit number od messages in logs commit 4fc55b33504d446ef192a25e5164f861e1495291 Author: Jungtaek Lim <kabh...@gmail.com> Date: 2017-02-22T07:35:53Z Merge branch 'STORM-1363-1.x' into 1.x-branch commit d5f4c4021984bad9044654284f7d43ce03d24f41 Author: Jungtaek Lim <kabh...@gmail.com> Date: 2017-02-22T07:39:16Z STORM-1363: CHANGELOG commit d4ff6b51f206e85e65bb5c4d06b4c28c828df174 Author: Hugo Louro <hmclo...@gmail.com> Date: 2017-02-22T23:03:16Z STORM-2374: Storm Kafka Client Func Interface Must be Serializable commit 5de6e1dd46dadc5451fa0e3669120617aa8bbf8e Author: Jungtaek Lim <kabh...@gmail.com> Date: 2017-02-22T04:03:45Z Add Storm SQL docs to index page for 1.x branch ---- ---