James Xu created STORM-166:
------------------------------
Summary: Highly available Nimbus
Key: STORM-166
URL: https://issues.apache.org/jira/browse/STORM-166
Project: Apache Storm (Incubating)
Issue Type: New Feature
Reporter: James Xu
Priority: Minor
https://github.com/nathanmarz/storm/issues/360
The goal of this feature is to be able to run multiple Nimbus servers so that
if one goes down another one will transparently take over. Here's what needs to
happen to implement this:
1. Everything currently stored on local disk on Nimbus needs to be stored in a
distributed and reliable fashion. A DFS is perfect for this. However, as we do
not want to make a DFS a mandatory requirement to run Storm, the storage of
these artifacts should be pluggable (default to local filesystem, but the
interface should support DFS). You would only be able to run multiple NImbus if
you use the right storage, and the storage interface chosen should have a flag
indicating whether it's suitable for HA mode or not. If you choose local
storage and try to run multiple Nimbus, one of the Nimbus's should fail to
launch.
2. Nimbus's should register themselves in Zookeeper. They should use a leader
election protocol to decide which one is currently responsible for launching
and monitoring topologies.
3. StormSubmitter should find the Nimbus to connect to via Zookeeper. In case
the leader changes during submission, it should use a retry protocol to try
reconnecting to the new leader and attempting submission again.
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)