[ 
https://issues.apache.org/jira/browse/STORM-166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962014#comment-13962014
 ] 

ASF GitHub Bot commented on STORM-166:
--------------------------------------

Github user revans2 commented on the pull request:

    https://github.com/apache/incubator-storm/pull/61#issuecomment-39755353
  
    I have done a quick pass through the code.  It looks like there are several 
places that the code is leaking connections to ZK.  I am also concerned about 
the extra load that this may be placing on ZK.  ZK is already the bottleneck 
for scalability of the cluster, a new client connecting does a write operation 
to the cluster, and having every client make a connection is bad, but not a 
deal-breaker. However, also having the existing daemons constantly making new 
connections to ZK feels like it will cause a lot of scalability issues.


> Highly available Nimbus
> -----------------------
>
>                 Key: STORM-166
>                 URL: https://issues.apache.org/jira/browse/STORM-166
>             Project: Apache Storm (Incubating)
>          Issue Type: New Feature
>            Reporter: James Xu
>            Priority: Minor
>
> https://github.com/nathanmarz/storm/issues/360
> The goal of this feature is to be able to run multiple Nimbus servers so that 
> if one goes down another one will transparently take over. Here's what needs 
> to happen to implement this:
> 1. Everything currently stored on local disk on Nimbus needs to be stored in 
> a distributed and reliable fashion. A DFS is perfect for this. However, as we 
> do not want to make a DFS a mandatory requirement to run Storm, the storage 
> of these artifacts should be pluggable (default to local filesystem, but the 
> interface should support DFS). You would only be able to run multiple NImbus 
> if you use the right storage, and the storage interface chosen should have a 
> flag indicating whether it's suitable for HA mode or not. If you choose local 
> storage and try to run multiple Nimbus, one of the Nimbus's should fail to 
> launch.
> 2. Nimbus's should register themselves in Zookeeper. They should use a leader 
> election protocol to decide which one is currently responsible for launching 
> and monitoring topologies.
> 3. StormSubmitter should find the Nimbus to connect to via Zookeeper. In case 
> the leader changes during submission, it should use a retry protocol to try 
> reconnecting to the new leader and attempting submission again.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to