[
https://issues.apache.org/jira/browse/STORM-166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965428#comment-13965428
]
ASF GitHub Bot commented on STORM-166:
--------------------------------------
Github user revans2 commented on a diff in the pull request:
https://github.com/apache/incubator-storm/pull/61#discussion_r11489306
--- Diff: storm-core/src/clj/backtype/storm/daemon/nimbus.clj ---
@@ -894,10 +895,47 @@
)
)
+(defn- sync-storm-code-from-leader [nimbus]
+ (let [conf (:conf nimbus)
+ storm-cluster-state (:storm-cluster-state nimbus)
+ storm-ids (.assignments storm-cluster-state nil)
+ storm-code-map (->> (dofor [sid storm-ids] {sid (.assignment-info
storm-cluster-state sid nil)})
+ (apply merge)
+ (filter-val not-nil?)
+ (map-val :master-code-dir)
+ )
+ downloaded-storm-ids (set (map #(java.net.URLDecoder/decode %)
(read-dir-contents (master-stormdist-root conf))))
+ tmproot (str (master-tmp-dir conf) file-path-separator (uuid))]
+ (doseq [[storm-id master-code-dir] storm-code-map]
+ (when (not (downloaded-storm-ids storm-id))
+ (log-message "Downloading code for storm id " storm-id " from "
master-code-dir)
+
+ (FileUtils/forceMkdir (File. tmproot))
+ (Utils/downloadFromMaster conf (master-stormjar-path
master-code-dir) (master-stormjar-path tmproot))
+ (Utils/downloadFromMaster conf (master-stormcode-path
master-code-dir) (master-stormcode-path tmproot))
+ (Utils/downloadFromMaster conf (master-stormconf-path
master-code-dir) (master-stormconf-path tmproot))
+ (FileUtils/moveDirectory (File. tmproot) (File.
(master-stormdist-root conf storm-id)))
+
+ (log-message "Finished downloading code for storm id " storm-id
" from " master-code-dir)
+ )
+ )
+ )
+)
+
(defserverfn service-handler [conf inimbus]
(.prepare inimbus conf (master-inimbus-dir conf))
(log-message "Starting Nimbus with conf " conf)
- (let [nimbus (nimbus-data conf inimbus)]
+ (let [nimbus (nimbus-data conf inimbus)
+ nimbus-leadership (nimbus-leadership conf)]
--- End diff --
My concern isn't in the acquire, it is after the acquire. The current code
will open up two sockets to ZK. One is used for the mutex. The other is used
for interaction with ZK. If the active nimbus, that has passed the
mutex.acquire() already, now has a networking glitch that causes just the mutex
connection to be dropped. I don't see how this will cause the currently active
nimbus to get an IOException and shutdown.
> Highly available Nimbus
> -----------------------
>
> Key: STORM-166
> URL: https://issues.apache.org/jira/browse/STORM-166
> Project: Apache Storm (Incubating)
> Issue Type: New Feature
> Reporter: James Xu
> Priority: Minor
>
> https://github.com/nathanmarz/storm/issues/360
> The goal of this feature is to be able to run multiple Nimbus servers so that
> if one goes down another one will transparently take over. Here's what needs
> to happen to implement this:
> 1. Everything currently stored on local disk on Nimbus needs to be stored in
> a distributed and reliable fashion. A DFS is perfect for this. However, as we
> do not want to make a DFS a mandatory requirement to run Storm, the storage
> of these artifacts should be pluggable (default to local filesystem, but the
> interface should support DFS). You would only be able to run multiple NImbus
> if you use the right storage, and the storage interface chosen should have a
> flag indicating whether it's suitable for HA mode or not. If you choose local
> storage and try to run multiple Nimbus, one of the Nimbus's should fail to
> launch.
> 2. Nimbus's should register themselves in Zookeeper. They should use a leader
> election protocol to decide which one is currently responsible for launching
> and monitoring topologies.
> 3. StormSubmitter should find the Nimbus to connect to via Zookeeper. In case
> the leader changes during submission, it should use a retry protocol to try
> reconnecting to the new leader and attempting submission again.
--
This message was sent by Atlassian JIRA
(v6.2#6252)