Github user danielschonfeld commented on the pull request: https://github.com/apache/storm/pull/802#issuecomment-148807939 @Parth-Brahmbhatt that's a tricky one. I haven't found a way to reproduce but leaving nimbus work for a day or so with number of nimbuses > 1 and a good load on the system we see the number of ZK nodes/keys go up to (X*nimbuses)+1 under /leader-lock. When that happens, we have problems trying to do anything as no nimbus thinks it's the leader which is exactly what's described in CURATOR-202. If you can think of a way to disconnect the ZK connection but reconnect using the same session programmatically you'll have a reproduction of this bug as this always starts showing up after something like the following log lines: ``` 2015-10-16 18:16:13 o.a.s.s.o.a.c.f.s.ConnectionStateManager [INFO] State change: RECONNECTED 2015-10-16 18:16:14 o.a.s.s.o.a.z.ClientCnxn [INFO] Client session timed out, have not heard from server in 6668ms for sessionid 0x1506caf14ab005f, closing socket connection and attempting reconnect 2015-10-16 18:16:14 o.a.s.s.o.a.z.ClientCnxn [INFO] Client session timed out, have not heard from server in 6672ms for sessionid 0x1506caf14ab0060, closing socket connection and attempting reconnect 2015-10-16 18:16:15 o.a.s.s.o.a.z.ClientCnxn [INFO] Opening socket connection to server 10.101.1.2/10.101.1.2:2181. Will not attempt to authenticate using SASL (unknown error) 2015-10-16 18:16:15 o.a.s.s.o.a.z.ClientCnxn [INFO] Socket connection established to 10.101.1.2/10.101.1.2:2181, initiating session 2015-10-16 18:16:15 o.a.s.s.o.a.z.ClientCnxn [INFO] Session establishment complete on server 10.101.1.2/10.101.1.2:2181, sessionid = 0x1506caf14ab005f, negotiated timeout = 20000 2015-10-16 18:16:15 o.a.s.s.o.a.c.f.s.ConnectionStateManager [INFO] State change: RECONNECTED ```
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---