[ 
https://issues.apache.org/jira/browse/STORM-166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965119#comment-13965119
 ] 

ASF GitHub Bot commented on STORM-166:
--------------------------------------

Github user yveschina commented on a diff in the pull request:

    https://github.com/apache/incubator-storm/pull/61#discussion_r11474408
  
    --- Diff: storm-core/src/clj/backtype/storm/daemon/nimbus.clj ---
    @@ -894,10 +895,47 @@
       )
     )
     
    +(defn- sync-storm-code-from-leader [nimbus]
    +  (let [conf (:conf nimbus)
    +        storm-cluster-state (:storm-cluster-state nimbus)
    +        storm-ids (.assignments storm-cluster-state nil)
    +        storm-code-map (->> (dofor [sid storm-ids] {sid (.assignment-info 
storm-cluster-state sid nil)})
    +                            (apply merge)
    +                            (filter-val not-nil?)
    +                            (map-val :master-code-dir)
    +                            )
    +        downloaded-storm-ids (set (map #(java.net.URLDecoder/decode %) 
(read-dir-contents (master-stormdist-root conf))))
    +        tmproot (str (master-tmp-dir conf) file-path-separator (uuid))]
    +    (doseq [[storm-id master-code-dir] storm-code-map]
    +        (when (not (downloaded-storm-ids storm-id))
    +          (log-message "Downloading code for storm id " storm-id " from " 
master-code-dir)
    +          
    +          (FileUtils/forceMkdir (File. tmproot))
    +          (Utils/downloadFromMaster conf (master-stormjar-path 
master-code-dir) (master-stormjar-path tmproot))
    +          (Utils/downloadFromMaster conf (master-stormcode-path 
master-code-dir) (master-stormcode-path tmproot))
    +          (Utils/downloadFromMaster conf (master-stormconf-path 
master-code-dir) (master-stormconf-path tmproot))
    +          (FileUtils/moveDirectory (File. tmproot) (File. 
(master-stormdist-root conf storm-id)))
    +          
    +          (log-message "Finished downloading code for storm id " storm-id 
" from " master-code-dir)
    +         )
    +     )
    +  )
    +)
    +
     (defserverfn service-handler [conf inimbus]
       (.prepare inimbus conf (master-inimbus-dir conf))
       (log-message "Starting Nimbus with conf " conf)
    -  (let [nimbus (nimbus-data conf inimbus)]
    +  (let [nimbus (nimbus-data conf inimbus)
    +        nimbus-leadership (nimbus-leadership conf)]
    --- End diff --
    
    About this i referred to the InterProcessMutex source code and believe that 
when networking glitch cause mutex.acquire() lose it's zk connection, an 
IOException will be throwed up by mutext.acquire() to cause this nimbus 
shutdown finally. Is there any other possibility may result in two or more 
nimbus instances?


> Highly available Nimbus
> -----------------------
>
>                 Key: STORM-166
>                 URL: https://issues.apache.org/jira/browse/STORM-166
>             Project: Apache Storm (Incubating)
>          Issue Type: New Feature
>            Reporter: James Xu
>            Priority: Minor
>
> https://github.com/nathanmarz/storm/issues/360
> The goal of this feature is to be able to run multiple Nimbus servers so that 
> if one goes down another one will transparently take over. Here's what needs 
> to happen to implement this:
> 1. Everything currently stored on local disk on Nimbus needs to be stored in 
> a distributed and reliable fashion. A DFS is perfect for this. However, as we 
> do not want to make a DFS a mandatory requirement to run Storm, the storage 
> of these artifacts should be pluggable (default to local filesystem, but the 
> interface should support DFS). You would only be able to run multiple NImbus 
> if you use the right storage, and the storage interface chosen should have a 
> flag indicating whether it's suitable for HA mode or not. If you choose local 
> storage and try to run multiple Nimbus, one of the Nimbus's should fail to 
> launch.
> 2. Nimbus's should register themselves in Zookeeper. They should use a leader 
> election protocol to decide which one is currently responsible for launching 
> and monitoring topologies.
> 3. StormSubmitter should find the Nimbus to connect to via Zookeeper. In case 
> the leader changes during submission, it should use a retry protocol to try 
> reconnecting to the new leader and attempting submission again.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to