James Xu created STORM-166:
------------------------------

             Summary: Highly available Nimbus
                 Key: STORM-166
                 URL: https://issues.apache.org/jira/browse/STORM-166
             Project: Apache Storm (Incubating)
          Issue Type: New Feature
            Reporter: James Xu
            Priority: Minor


https://github.com/nathanmarz/storm/issues/360

The goal of this feature is to be able to run multiple Nimbus servers so that 
if one goes down another one will transparently take over. Here's what needs to 
happen to implement this:

1. Everything currently stored on local disk on Nimbus needs to be stored in a 
distributed and reliable fashion. A DFS is perfect for this. However, as we do 
not want to make a DFS a mandatory requirement to run Storm, the storage of 
these artifacts should be pluggable (default to local filesystem, but the 
interface should support DFS). You would only be able to run multiple NImbus if 
you use the right storage, and the storage interface chosen should have a flag 
indicating whether it's suitable for HA mode or not. If you choose local 
storage and try to run multiple Nimbus, one of the Nimbus's should fail to 
launch.

2. Nimbus's should register themselves in Zookeeper. They should use a leader 
election protocol to decide which one is currently responsible for launching 
and monitoring topologies.

3. StormSubmitter should find the Nimbus to connect to via Zookeeper. In case 
the leader changes during submission, it should use a retry protocol to try 
reconnecting to the new leader and attempting submission again.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to