[ 
https://issues.apache.org/jira/browse/STORM-150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14333431#comment-14333431
 ] 

ASF GitHub Bot commented on STORM-150:
--------------------------------------

Github user d2r commented on the pull request:

    https://github.com/apache/storm/pull/71#issuecomment-75572223
  
    This pull request has been open for most of a year.  Should we close this 
for now if we are not planning to do anything about it?


> Replace jar distribution strategy with bittorent
> ------------------------------------------------
>
>                 Key: STORM-150
>                 URL: https://issues.apache.org/jira/browse/STORM-150
>             Project: Apache Storm
>          Issue Type: Improvement
>            Reporter: James Xu
>            Priority: Minor
>
> https://github.com/nathanmarz/storm/issues/435
> Consider using http://turn.github.com/ttorrent/
> ----------
> ptgoetz: I've been looking into implementing this, but have a design question 
> that boils down to this:
> Should the client (storm jar command) be the initial seeder, or should we 
> wait and let nimbus do the seeding?
> The benefit of doing the seeding client-side is that we would only have to 
> transfer a small .torrent through the nimbus thrift API. But I can imagine 
> situations where the network environment would prevent BitTorrent clients 
> from connecting back to the machine that's submitting the topology. This 
> would create an indefinitely "stalled submission" since none of the cluster 
> nodes would be able to connect to the seeder.
> The alternative would be to use the current technique of uploading the jar to 
> nimbus, and have nimbus generate and distribute the .torrent file, and 
> provide the initial seed. If the cluster is properly configured, we're pretty 
> much guaranteed connectivity between nimbus and supervisor nodes.
> I'm leaning toward the latter approach, but would be interested in others' 
> opinions.
> ----------
> nathanmarz: @ptgoetz I think Nimbus should do the seeding. That ensures that 
> when the client finishes submitting, it can disconnect/go away without having 
> to worry about making the topology unlaunchable.
> ----------
> jasonjckn: @nathanmarz How does this solve the nimbuses dependency on 
> reliable local disk state (as you talked about in person)?
> What happens when zookeeper is offline for 1 hour? All the workers will die, 
> and nimbus will be continually restarting. The onus is still on nimbus to 
> store topology jars on local disk, so that when the workers and supervisors 
> reboots it can seed all this again.
> You -can- solve the local disk persistence problem with replicated state to 
> the non-elected nimbuses, but that's orthogonal to a distribution strategy. 
> Yes there is some replication going on in bittorrent, but it's not really a 
> protocol that delivers reliable persistence of state.
> I think it's still a good feature if it gives us performant topology submit 
> times even with 500 workers, which take 3 minutes for us.
> Particularly with the worker heartbeat start-up timeout of 120s, you want to 
> be able to start 500 workers within 120s, or even 1500 workers within 120s, 
> the current distribution strategy is not scalable in that way.
> ----------
> nathanmarz: @jasonjckn On top of the bittorrent stuff we can ensure that a 
> topology is considered submitted only when the artifacts exist on at least N 
> nodes. Nimbus would only be the initial seed for topology files. Also, it 
> wouldn't have to only be Nimbus that acts as a seed, that work could be 
> shared by the supervisors. That's less relevant in the storm-mesos world, but 
> you could still fairly easily run multiple Nimbus's to get replication.
> ----------
> jasonjckn: This PR might be aided by "topology packages" #557, as it bundles 
> all the state that needs to be replicated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to