[ 
https://issues.apache.org/jira/browse/STORM-150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965619#comment-13965619
 ] 

ASF GitHub Bot commented on STORM-150:
--------------------------------------

GitHub user ptgoetz opened a pull request:

    https://github.com/apache/incubator-storm/pull/71

    STORM-150: Replace jar distribution strategy with bittorrent

    This is a resurrection of a pre-apache pull request: 
https://github.com/nathanmarz/storm/pull/629
    (original discussion and code review can be found there)
    
    I've updated it to merge cleanly with the latest master branch and fixed a 
bug that would prevent multi-lang topologies from working.
    
    Using bittorrent to distribute topology code will make is easier to 
implement HA Nimbus easier since multiple nimbus nodes could seed to one 
another when a topology is submitted (i.e. each would have a full copy of the 
code). We could also incorporate it into leader election such that only a 
nimbus node with a complete copy of topology code could be elected leader.
    
    It also helps alleviate the "thundering herd" issue where supervisors all 
hit nimbus at once to download jars.
    
    I've tested this on multi-node clusters with regular, multi-lang, and 
trident topologies, as well as topology jars larger than 100MB.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ptgoetz/incubator-storm bittorrent

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-storm/pull/71.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #71
    
----
commit 567f72a6a2c68b5014704fb16f3d9a8ed19e54fa
Author: P. Taylor Goetz <[email protected]>
Date:   2013-07-17T20:42:19Z

    add ttorrent BitTorrent client dependency to storm core

commit abe3414d0fe16a2ad5ab465be480821ca9ef3ff8
Author: P. Taylor Goetz <[email protected]>
Date:   2013-07-17T20:43:54Z

    Add bittorrent tracker and client code for use by nimbus and supervisor

commit dcb609a4230190eeb7eea2e3e8380728b06020c6
Author: P. Taylor Goetz <[email protected]>
Date:   2013-07-17T20:45:39Z

    change nimbus and supervisor to use bittorrent for jar file distribution.

commit f70022d8bddcf4d6d823f3a2c1bc05fe457f07ca
Author: P. Taylor Goetz <[email protected]>
Date:   2013-07-17T20:46:32Z

    add bittorrent configuration defaults

commit 539cf33d27b976444fab990b40fedddfd61814fe
Author: P. Taylor Goetz <[email protected]>
Date:   2013-07-17T21:00:27Z

    revert some changes that got lost by a stash apply

commit 4a76e4cee834d6a9a52437ceecacd905da91a694
Author: P. Taylor Goetz <[email protected]>
Date:   2013-07-24T17:52:10Z

    add configuration values for bittorrent rate limits and seeding

commit 17f1acc3d03e99a1c4c7d96d9921211fb8115373
Author: P. Taylor Goetz <[email protected]>
Date:   2013-07-24T17:54:35Z

    implement configurable bitorrent rate limits and seeding for 
nimbus/supervisor

commit 3b56e94ec299cb08c689553cd0daedaa3de943e8
Author: P. Taylor Goetz <[email protected]>
Date:   2013-07-24T17:57:45Z

    modify nimbus/supervisor to exchange all topology code (.jar, .ser) via 
bittorrent

commit bf5de3f994f325fbe5b2283c7e750da35dce5a49
Author: P. Taylor Goetz <[email protected]>
Date:   2013-09-14T19:35:40Z

    fix local mode supervisor

commit dec77b7db73160085f0912952d932b34fd1542d8
Author: P. Taylor Goetz <[email protected]>
Date:   2013-09-14T20:11:35Z

    only start bittorrent tracker in distributed mode.

commit efd84a682d9ea5b713a66246429afed512e4da6d
Author: P. Taylor Goetz <[email protected]>
Date:   2013-09-14T20:22:55Z

    only start Nimbus bittorrent tracker in distributed mode.

commit 2209da90861f9779dd4d621b5d6e11ba46493599
Author: P. Taylor Goetz <[email protected]>
Date:   2013-09-26T14:35:35Z

    address issues identified in #629 review.

commit 3b8059cf840c832a65bae8c27f447b2e82908e9f
Author: P. Taylor Goetz <[email protected]>
Date:   2013-09-26T21:10:46Z

    synchronize rebalanceRates() method and correct indentation.

commit 24ae037cda883b97351d19627578a7896d519801
Author: P. Taylor Goetz <[email protected]>
Date:   2013-09-27T18:28:45Z

    Rename BaseTracker and SupervisorTracker to BasePeer and SupervisorPeer

commit 755bebf0b98b9f5cb689b54e9c766b923c662733
Author: P. Taylor Goetz <[email protected]>
Date:   2013-09-27T18:30:35Z

    organize imports

commit f784ef3ae596a221cfd25b45c70c38e1737367b6
Author: P. Taylor Goetz <[email protected]>
Date:   2013-09-27T18:32:53Z

    Rename BaseTracker, SupervisorTracker --> BasePeer, SupervisorPeer

commit ff79d6f857faf1002e0577295dd2c73e8ce2a482
Author: P. Taylor Goetz <[email protected]>
Date:   2013-10-08T14:51:36Z

    exclude log4j dependencies from ttorrent dependency

commit 6b85db929a970e18c460a5b5086b2bca24087651
Author: P. Taylor Goetz <[email protected]>
Date:   2014-03-13T03:06:11Z

    Merge branch 'master' into bittorrent
    
    Conflicts:
        conf/defaults.yaml
        storm-core/project.clj
        storm-core/src/clj/backtype/storm/config.clj
        storm-core/src/jvm/backtype/storm/Config.java

commit 4df45cb6713776517e0579b093cc646b0e578ec2
Author: P. Taylor Goetz <[email protected]>
Date:   2014-03-13T03:14:32Z

    update dependencies and clojure code

commit cc56dac5dbbe6f5f2c53254f9d9964378af4c975
Author: P. Taylor Goetz <[email protected]>
Date:   2014-04-10T17:30:16Z

    fix for multilang resources being extracted to the wrong location

----


> Replace jar distribution strategy with bittorent
> ------------------------------------------------
>
>                 Key: STORM-150
>                 URL: https://issues.apache.org/jira/browse/STORM-150
>             Project: Apache Storm (Incubating)
>          Issue Type: Improvement
>            Reporter: James Xu
>            Priority: Minor
>
> https://github.com/nathanmarz/storm/issues/435
> Consider using http://turn.github.com/ttorrent/
> ----------
> ptgoetz: I've been looking into implementing this, but have a design question 
> that boils down to this:
> Should the client (storm jar command) be the initial seeder, or should we 
> wait and let nimbus do the seeding?
> The benefit of doing the seeding client-side is that we would only have to 
> transfer a small .torrent through the nimbus thrift API. But I can imagine 
> situations where the network environment would prevent BitTorrent clients 
> from connecting back to the machine that's submitting the topology. This 
> would create an indefinitely "stalled submission" since none of the cluster 
> nodes would be able to connect to the seeder.
> The alternative would be to use the current technique of uploading the jar to 
> nimbus, and have nimbus generate and distribute the .torrent file, and 
> provide the initial seed. If the cluster is properly configured, we're pretty 
> much guaranteed connectivity between nimbus and supervisor nodes.
> I'm leaning toward the latter approach, but would be interested in others' 
> opinions.
> ----------
> nathanmarz: @ptgoetz I think Nimbus should do the seeding. That ensures that 
> when the client finishes submitting, it can disconnect/go away without having 
> to worry about making the topology unlaunchable.
> ----------
> jasonjckn: @nathanmarz How does this solve the nimbuses dependency on 
> reliable local disk state (as you talked about in person)?
> What happens when zookeeper is offline for 1 hour? All the workers will die, 
> and nimbus will be continually restarting. The onus is still on nimbus to 
> store topology jars on local disk, so that when the workers and supervisors 
> reboots it can seed all this again.
> You -can- solve the local disk persistence problem with replicated state to 
> the non-elected nimbuses, but that's orthogonal to a distribution strategy. 
> Yes there is some replication going on in bittorrent, but it's not really a 
> protocol that delivers reliable persistence of state.
> I think it's still a good feature if it gives us performant topology submit 
> times even with 500 workers, which take 3 minutes for us.
> Particularly with the worker heartbeat start-up timeout of 120s, you want to 
> be able to start 500 workers within 120s, or even 1500 workers within 120s, 
> the current distribution strategy is not scalable in that way.
> ----------
> nathanmarz: @jasonjckn On top of the bittorrent stuff we can ensure that a 
> topology is considered submitted only when the artifacts exist on at least N 
> nodes. Nimbus would only be the initial seed for topology files. Also, it 
> wouldn't have to only be Nimbus that acts as a seed, that work could be 
> shared by the supervisors. That's less relevant in the storm-mesos world, but 
> you could still fairly easily run multiple Nimbus's to get replication.
> ----------
> jasonjckn: This PR might be aided by "topology packages" #557, as it bundles 
> all the state that needs to be replicated.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to