[jira] [Commented] (STORM-150) Replace jar distribution strategy with bittorent

ASF GitHub Bot (JIRA) Tue, 15 Apr 2014 10:54:02 -0700

    [ 
https://issues.apache.org/jira/browse/STORM-150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969813#comment-13969813
 ]


ASF GitHub Bot commented on STORM-150:
--------------------------------------

Github user revans2 commented on a diff in the pull request:

    https://github.com/apache/incubator-storm/pull/71#discussion_r11647620
  
    --- Diff: storm-core/src/jvm/backtype/storm/torrent/BasePeer.java ---
    @@ -0,0 +1,34 @@
    +package backtype.storm.torrent;
    +
    +import java.util.HashMap;
    +
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
    +
    +import com.turn.ttorrent.client.Client;
    +
    +public abstract class BasePeer {
    +    private static final Logger LOG = 
LoggerFactory.getLogger(BasePeer.class);
    +
    +    protected HashMap<String, Client> clients = new HashMap<String, 
Client>();
    --- End diff --
    
    It is not very clear here what the key is to the clients.  Could we either 
have a comment that this is topology ID, or a clearer name. 


> Replace jar distribution strategy with bittorent
> ------------------------------------------------
>
>                 Key: STORM-150
>                 URL: https://issues.apache.org/jira/browse/STORM-150
>             Project: Apache Storm (Incubating)
>          Issue Type: Improvement
>            Reporter: James Xu
>            Priority: Minor
>
> https://github.com/nathanmarz/storm/issues/435
> Consider using http://turn.github.com/ttorrent/
> ----------
> ptgoetz: I've been looking into implementing this, but have a design question 
> that boils down to this:
> Should the client (storm jar command) be the initial seeder, or should we 
> wait and let nimbus do the seeding?
> The benefit of doing the seeding client-side is that we would only have to 
> transfer a small .torrent through the nimbus thrift API. But I can imagine 
> situations where the network environment would prevent BitTorrent clients 
> from connecting back to the machine that's submitting the topology. This 
> would create an indefinitely "stalled submission" since none of the cluster 
> nodes would be able to connect to the seeder.
> The alternative would be to use the current technique of uploading the jar to 
> nimbus, and have nimbus generate and distribute the .torrent file, and 
> provide the initial seed. If the cluster is properly configured, we're pretty 
> much guaranteed connectivity between nimbus and supervisor nodes.
> I'm leaning toward the latter approach, but would be interested in others' 
> opinions.
> ----------
> nathanmarz: @ptgoetz I think Nimbus should do the seeding. That ensures that 
> when the client finishes submitting, it can disconnect/go away without having 
> to worry about making the topology unlaunchable.
> ----------
> jasonjckn: @nathanmarz How does this solve the nimbuses dependency on 
> reliable local disk state (as you talked about in person)?
> What happens when zookeeper is offline for 1 hour? All the workers will die, 
> and nimbus will be continually restarting. The onus is still on nimbus to 
> store topology jars on local disk, so that when the workers and supervisors 
> reboots it can seed all this again.
> You -can- solve the local disk persistence problem with replicated state to 
> the non-elected nimbuses, but that's orthogonal to a distribution strategy. 
> Yes there is some replication going on in bittorrent, but it's not really a 
> protocol that delivers reliable persistence of state.
> I think it's still a good feature if it gives us performant topology submit 
> times even with 500 workers, which take 3 minutes for us.
> Particularly with the worker heartbeat start-up timeout of 120s, you want to 
> be able to start 500 workers within 120s, or even 1500 workers within 120s, 
> the current distribution strategy is not scalable in that way.
> ----------
> nathanmarz: @jasonjckn On top of the bittorrent stuff we can ensure that a 
> topology is considered submitted only when the artifacts exist on at least N 
> nodes. Nimbus would only be the initial seed for topology files. Also, it 
> wouldn't have to only be Nimbus that acts as a seed, that work could be 
> shared by the supervisors. That's less relevant in the storm-mesos world, but 
> you could still fairly easily run multiple Nimbus's to get replication.
> ----------
> jasonjckn: This PR might be aided by "topology packages" #557, as it bundles 
> all the state that needs to be replicated.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (STORM-150) Replace jar distribution strategy with bittorent

Reply via email to