Itai Frenkel created STORM-526:
----------------------------------

             Summary: Nimbus triggered complete removal of all topologies due 
to maintenance in 2 out of 3 zookeeper servers
                 Key: STORM-526
                 URL: https://issues.apache.org/jira/browse/STORM-526
             Project: Apache Storm
          Issue Type: Bug
    Affects Versions: 0.9.2-incubating
         Environment: AWS EC2 ubuntu
            Reporter: Itai Frenkel


We use a cluster of 3 zookeepers, all 3 ip addresses are in the storm.yml file. 
We were restarting one zookeeper, and once it was ready, we restarted the 
second zookeeper. All this time the third zookeeper was "green" (as monitored 
by Netfix Exhibitor).

At this same time nimbus has "decided" to remove all topologies (log entry is 
"Corrupt topology my-topology-xxx has state on zookeeper but doesn't have a 
local dir on Nimbus. Cleaning up...").

I looked at the relevant code and I am not entirely sure the log message 
describes correctly the code.

Could anyone please read the nimbus.clj#cleanup-corrupt-topologies and explain 
under what conditions does nimbus act in that way ?
https://github.com/apache/storm/blob/v0.9.2-incubating/storm-core/src/clj/backtype/storm/daemon/nimbus.clj#L854


Log file:
2014-10-01 10:47:19 b.s.d.nimbus [INFO] Corrupt topology 
my-topology-1-2-1412151059 has state on zookeeper but doesn't have a local dir 
on Nimbus. Cleaning up...
2014-10-01 10:47:19 b.s.d.nimbus [INFO] Corrupt topology 
my-topology-0-1-1412151059 has state on zookeeper but doesn't have a local dir 
on Nimbus. Cleaning up...
2014-10-01 10:47:19 b.s.d.nimbus [INFO] Corrupt topology 
my-topology-3-4-1412151062 has state on zookeeper but doesn't have a local dir 
on Nimbus. Cleaning up...
2014-10-01 10:47:19 b.s.d.nimbus [INFO] Corrupt topology 
my-topology-2-3-1412151060 has state on zookeeper but doesn't have a local dir 
on Nimbus. Cleaning up...
2014-10-01 10:47:19 b.s.d.nimbus [INFO] Starting Nimbus server...
2014-10-01 10:47:20 b.s.d.nimbus [INFO] Cleaning up my-topology-1-2-1412151059
2014-10-01 10:47:20 b.s.d.nimbus [INFO] Cleaning up my-topology-0-1-1412151059
2014-10-01 10:47:20 b.s.d.nimbus [INFO] Cleaning up my-topology-3-4-1412151062
2014-10-01 10:47:20 b.s.d.nimbus [INFO] Cleaning up my-topology-2-3-1412151060
2014-10-01 10:52:16 b.s.d.nimbus [INFO] Shutting down master






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to