[
https://issues.apache.org/jira/browse/STORM-166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962117#comment-13962117
]
ASF GitHub Bot commented on STORM-166:
--------------------------------------
Github user revans2 commented on a diff in the pull request:
https://github.com/apache/incubator-storm/pull/61#discussion_r11358486
--- Diff: storm-core/src/clj/backtype/storm/daemon/nimbus.clj ---
@@ -894,10 +895,47 @@
)
)
+(defn- sync-storm-code-from-leader [nimbus]
--- End diff --
What happens if the standby nimbus has not downloaded everything when it
becomes the master? The current code relies on the topology to be scheduled
before the standby downloads anything. But there is a period of time between
when the topology is submitted successfully and when the topology is scheduled
where if nimbus goes down for whatever reason the topology will be killed.
This is probably OK to start out with, but would be nice to fix.
> Highly available Nimbus
> -----------------------
>
> Key: STORM-166
> URL: https://issues.apache.org/jira/browse/STORM-166
> Project: Apache Storm (Incubating)
> Issue Type: New Feature
> Reporter: James Xu
> Priority: Minor
>
> https://github.com/nathanmarz/storm/issues/360
> The goal of this feature is to be able to run multiple Nimbus servers so that
> if one goes down another one will transparently take over. Here's what needs
> to happen to implement this:
> 1. Everything currently stored on local disk on Nimbus needs to be stored in
> a distributed and reliable fashion. A DFS is perfect for this. However, as we
> do not want to make a DFS a mandatory requirement to run Storm, the storage
> of these artifacts should be pluggable (default to local filesystem, but the
> interface should support DFS). You would only be able to run multiple NImbus
> if you use the right storage, and the storage interface chosen should have a
> flag indicating whether it's suitable for HA mode or not. If you choose local
> storage and try to run multiple Nimbus, one of the Nimbus's should fail to
> launch.
> 2. Nimbus's should register themselves in Zookeeper. They should use a leader
> election protocol to decide which one is currently responsible for launching
> and monitoring topologies.
> 3. StormSubmitter should find the Nimbus to connect to via Zookeeper. In case
> the leader changes during submission, it should use a retry protocol to try
> reconnecting to the new leader and attempting submission again.
--
This message was sent by Atlassian JIRA
(v6.2#6252)