GitHub user danny0405 opened a pull request:

    https://github.com/apache/storm/pull/2389

    Storm heartbeats promotion

    Storm now doesn't support large cluster[ for example thousand of 
supervisors] very well, for our production, topology submission/killing is very 
ineffective when cluster grows to be large, i checkout the heartbeats strategy 
and find that actually it can be promoted.
    
    For heartbeats promotion:
    
    1. Nimbus will not collect heartbeats info from zookeeper every scheduling 
round any more, instead, it will use directly an updated cache which is updated 
when a heartbeat reports
    2. Report heartbeats through supervisor RPC [supervisor will collect local 
heartbeats from local workers reported state which is in local state store]
    3. Separate metrics data and heartbeats, which means that the new heartbeat 
will not contains metrics info now, so it is very lightweight and efficient
    4. Metrics data will still be reported to zookeeper, we only use it for 
collecting UI stats info [in the old mode, UI stats is got from heartbeats 
cache, the new mode will fetch it from zookeeper directly]
    
    
![heartbeats-promotion](https://user-images.githubusercontent.com/7644508/32034188-be12919c-b9d6-11e7-9740-631aef5ff4b4.png)
    
    With this new heartbeats mode, heartbeats will be reported very 
efficiently, for our production, we have about 30 workers per node/supervisor, 
so i mock the data and did a pressure test for nimbus heartbeats response time:
    
![image](https://user-images.githubusercontent.com/7644508/32034309-88f3b210-b9d7-11e7-8798-139a7eca10de.png)
    
    We can see that for a 1 second heartbeat report frequency, nimbus will 
support at least 2000 nodes, for our production, we set the worker heartbeats 
reporting interval to 5 seconds, so it means that we can have a 5* 2000 nodes 
cluster for just one cluster
    
    Because we do not need to collect all heartbeats data and compute 
alive/free slots for every scheduling round[ use a computed cache directly], we 
schedule topologies very efficiently[ only 2 minutes for 5000 topologies]
    
    About robustness:
    1. when nimbus collapse, workers works fine[ like the original ], when 
leader starts up, it will wait for a complete heartbeats for all node and  
start to work again, i also make the strategy pluggable, user can override the 
default one
    2. when supervisor goes down, workers still workers fine,[ it will report 
heartbeat directly to nimbus through RPC], when supervisor goes up, it will 
just collect the heartbeats and reports to nimbus
    3. when zk is unstable, it will not affect the heartbeats[which will cause 
workers all collapse for old mode] any more
    
    This is my JIRA task: https://issues.apache.org/jira/browse/STORM-2693
    This is the assignments promotion PR: 
https://github.com/apache/storm/pull/2319

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/danny0405/storm heartbeats-promotion

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/2389.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2389
    
----
commit 9e06883b9c253ae71bab052fcbc7753f838d61b3
Author: Jungtaek Lim <kabh...@gmail.com>
Date:   2017-02-08T05:03:09Z

    STORM-2320: CHANGELOG

commit b621da98562db58062ec87a46238c0f0ccafb96e
Author: Aaron Dossett <aaron.doss...@target.com>
Date:   2016-03-21T18:06:44Z

    STORM-1464: storm-hdfs support for multiple output files and partitioning

commit 657dd8815b5e91d1163e302d6be96510715a4fd7
Author: P. Taylor Goetz <ptgo...@gmail.com>
Date:   2017-02-08T20:07:37Z

    add STORM-1464 to changelog

commit e372489c0fea259a5b2de4d42bc665593326ed8e
Author: P. Taylor Goetz <ptgo...@gmail.com>
Date:   2017-02-08T20:14:43Z

    Merge branch 'cyz-dev' of github.com:danny0405/storm into 1.x-branch

commit 2a7e6dc0543c8069efed21b7ced9472eb46b5237
Author: P. Taylor Goetz <ptgo...@gmail.com>
Date:   2017-02-08T20:15:33Z

    add STORM-2270 to changelog

commit bb5c6b84876da10d842889bb1729ccfab02af7b5
Author: Tibor Kiss <tibor.k...@gmail.com>
Date:   2017-02-07T05:11:32Z

    STORM-2350: Storm-HDFS's listFilesByModificationTime is broken

commit c417c8ee28384b392ddcbd96366047037164280d
Author: P. Taylor Goetz <ptgo...@gmail.com>
Date:   2017-02-08T21:28:06Z

    add STORM-2350 to changelog

commit 1e40b02655072270c829057c14a3570a3d6005b9
Author: Jungtaek Lim <kabh...@gmail.com>
Date:   2017-02-08T04:59:39Z

    Convert NoNodeException to KeyNotFoundException in 
getNimbodesWithLatestSequenceNumberOfBlob
    
    * since callers are able to handle KeyNotFoundException but not 
NoNodeException

commit f91166cf6ccd721201eac114923879fa2c9a4ba6
Author: Jungtaek Lim <kabh...@gmail.com>
Date:   2017-02-09T06:09:35Z

    Merge branch 
'fix-nonodeexception-getNimbodesWithLatestSequenceNumberOfBlob-1.x' into 
1.x-branch

commit 8b49350cca7bb113a3cdb308cf94f3bbf6a08946
Author: ambud <asharma52...@gmail.com>
Date:   2017-02-04T21:32:16Z

    STORM-2344 Adding Flux File Viewer to Nimbus UI
    
    Adding apache license and link to Storm Homepage
    
    Adding links from storm nimbus homepage
    
    Adding License for Javascript libraries. Using min js for esprima
    
    Adding license files

commit ea1c50e2cc68187883abb7222efcaefd7420e947
Author: Jungtaek Lim <kabh...@gmail.com>
Date:   2017-02-10T03:40:46Z

    Merge branch 'STORM-2344-1.x-merge' into 1.x-branch

commit 2128fc34a8a217c9a1b55edec666f25a2646bee6
Author: Jungtaek Lim <kabh...@gmail.com>
Date:   2017-02-10T03:41:06Z

    STORM-2344: CHANGELOG

commit f5a1cf0b25be68a2b188f888a419bb14d270e2bc
Author: mingmxu <ming...@ebay.com>
Date:   2017-02-03T20:03:37Z

    STORM-2340 fix AutoCommitMode issue in KafkaSpout
    
    * Closes #1919
    * fix: KafkaSpout is blocked in AutoCommitMode
    * add comments for impacts of AutoCommitMode
    * add doc about how to use KafkaSpout with at-most-once.
    * remove at-most-once for better describe the changes; emit null msgId when 
AutoCommitMode;
    * update sample code in storm-kafka-client to use inline setProp()

commit f90d17c9715b6329938f2bd41442da5250a76bdc
Author: Jungtaek Lim <kabh...@gmail.com>
Date:   2017-02-14T02:53:49Z

    Merge branch 'STORM-2340-1.x-merge' into 1.x-branch

commit 191a806de71d3e7526206b7cb6be7fad8f7da0bd
Author: Jungtaek Lim <kabh...@gmail.com>
Date:   2017-02-14T02:55:07Z

    STORM-2340: CHANGELOG

commit a03137ed70a3edf155fc2c06355e12f2d4fb38f6
Author: Stig Rohde Døssing <s...@it-minds.dk>
Date:   2017-02-14T20:31:45Z

    STORM-2250: Kafka spout refactoring to increase modularity and testability. 
Also support nanoseconds in Storm time simulation

commit d14c2935effa914ede12e0e038ebb5b732a1ef62
Author: P. Taylor Goetz <ptgo...@gmail.com>
Date:   2017-02-15T21:31:25Z

    Merge branch 'STORM-2250-1.x' of github.com:srdo/storm into 1.x-branch

commit 8b69d43828532646d3e87d95daa250a05fc8a0be
Author: P. Taylor Goetz <ptgo...@gmail.com>
Date:   2017-02-15T21:32:27Z

    add STORM-2250 to changelog

commit 17a2017fb644e353fb2a0f5bf50d400ee28036ba
Author: P. Taylor Goetz <ptgo...@gmail.com>
Date:   2017-02-16T18:57:43Z

    [maven-release-plugin] prepare release v1.1.0

commit ff80b098b5e2110d326d041b73014f5e9fbff395
Author: P. Taylor Goetz <ptgo...@gmail.com>
Date:   2017-02-16T19:01:11Z

    [maven-release-plugin] prepare for next development iteration

commit 2f69242d0b3557feb5dc710b9dcb302abbd72aae
Author: Arun Mahadevan <ar...@apache.org>
Date:   2017-02-15T18:20:49Z

    STORM-2365: Support for specifying output stream in event hubs spout

commit bdb557dd1c40d4a90d036ff5063df2c51ec90863
Author: Satish Duggana <sdugg...@hortonworks.com>
Date:   2017-02-17T10:38:22Z

    Added STORM-2365 to CHANGELOG.md

commit ee1309d2a9b8cdbe4f5266327d7c62c4f9222781
Author: Jungtaek Lim <kabh...@gmail.com>
Date:   2017-02-20T00:52:04Z

    Fix RAT issue from newly added js files

commit 593d523f874b70ceddcf67fe5dd4fa9af6c8436b
Author: Julien Nioche <jul...@digitalpebble.com>
Date:   2017-02-20T17:32:06Z

    STORM-2326 Upgrade log4j and slf4j

commit ebed1c8b01397b09f4083e66f574a25f9b7c585d
Author: Kyle Nusbaum <knusb...@yahoo-inc.com>
Date:   2017-02-21T20:18:31Z

    Fixing pacemaker delete-path bug.

commit 187d08bf45bf424f3963a604d72e076b00d594c7
Author: Sachin Pasalkar <sachin_pasal...@symantec.com>
Date:   2017-02-14T10:24:23Z

    STORM-1363: TridentKafkaState should handle null values from 
TridentTupleToKafkaMapper.getMessageFromTuple()
    
    Incase null value comes from the mapper it will print warning messages also 
added the time take to emit number od messages in logs

commit 4fc55b33504d446ef192a25e5164f861e1495291
Author: Jungtaek Lim <kabh...@gmail.com>
Date:   2017-02-22T07:35:53Z

    Merge branch 'STORM-1363-1.x' into 1.x-branch

commit d5f4c4021984bad9044654284f7d43ce03d24f41
Author: Jungtaek Lim <kabh...@gmail.com>
Date:   2017-02-22T07:39:16Z

    STORM-1363: CHANGELOG

commit d4ff6b51f206e85e65bb5c4d06b4c28c828df174
Author: Hugo Louro <hmclo...@gmail.com>
Date:   2017-02-22T23:03:16Z

    STORM-2374: Storm Kafka Client Func Interface Must be Serializable

commit 5de6e1dd46dadc5451fa0e3669120617aa8bbf8e
Author: Jungtaek Lim <kabh...@gmail.com>
Date:   2017-02-22T04:03:45Z

    Add Storm SQL docs to index page for 1.x branch

----


---

Reply via email to