Itai Frenkel created STORM-526:
----------------------------------
Summary: Nimbus triggered complete removal of all topologies due
to maintenance in 2 out of 3 zookeeper servers
Key: STORM-526
URL: https://issues.apache.org/jira/browse/STORM-526
Project: Apache Storm
Issue Type: Bug
Affects Versions: 0.9.2-incubating
Environment: AWS EC2 ubuntu
Reporter: Itai Frenkel
We use a cluster of 3 zookeepers, all 3 ip addresses are in the storm.yml file.
We were restarting one zookeeper, and once it was ready, we restarted the
second zookeeper. All this time the third zookeeper was "green" (as monitored
by Netfix Exhibitor).
At this same time nimbus has "decided" to remove all topologies (log entry is
"Corrupt topology my-topology-xxx has state on zookeeper but doesn't have a
local dir on Nimbus. Cleaning up...").
I looked at the relevant code and I am not entirely sure the log message
describes correctly the code.
Could anyone please read the nimbus.clj#cleanup-corrupt-topologies and explain
under what conditions does nimbus act in that way ?
https://github.com/apache/storm/blob/v0.9.2-incubating/storm-core/src/clj/backtype/storm/daemon/nimbus.clj#L854
Log file:
2014-10-01 10:47:19 b.s.d.nimbus [INFO] Corrupt topology
my-topology-1-2-1412151059 has state on zookeeper but doesn't have a local dir
on Nimbus. Cleaning up...
2014-10-01 10:47:19 b.s.d.nimbus [INFO] Corrupt topology
my-topology-0-1-1412151059 has state on zookeeper but doesn't have a local dir
on Nimbus. Cleaning up...
2014-10-01 10:47:19 b.s.d.nimbus [INFO] Corrupt topology
my-topology-3-4-1412151062 has state on zookeeper but doesn't have a local dir
on Nimbus. Cleaning up...
2014-10-01 10:47:19 b.s.d.nimbus [INFO] Corrupt topology
my-topology-2-3-1412151060 has state on zookeeper but doesn't have a local dir
on Nimbus. Cleaning up...
2014-10-01 10:47:19 b.s.d.nimbus [INFO] Starting Nimbus server...
2014-10-01 10:47:20 b.s.d.nimbus [INFO] Cleaning up my-topology-1-2-1412151059
2014-10-01 10:47:20 b.s.d.nimbus [INFO] Cleaning up my-topology-0-1-1412151059
2014-10-01 10:47:20 b.s.d.nimbus [INFO] Cleaning up my-topology-3-4-1412151062
2014-10-01 10:47:20 b.s.d.nimbus [INFO] Cleaning up my-topology-2-3-1412151060
2014-10-01 10:52:16 b.s.d.nimbus [INFO] Shutting down master
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)