Priyank Shah created STORM-1451:
-----------------------------------
Summary: Storm topology submission can take upto 5 minutes in HA
mode when zookeeper reconnects. Nimbus discovery can fail when zookeeper
reconnect happens.
Key: STORM-1451
URL: https://issues.apache.org/jira/browse/STORM-1451
Project: Apache Storm
Issue Type: Bug
Components: storm-core
Reporter: Priyank Shah
We discovered couple of issues when testing storm under vagrant clusters.
1. When a nimbus zookeeper connection is dropped and reconnected the ephemeral
entry for that host under /nimbuses gets deleted and is not auto recreated when
reconnection happens. This means even though nimbus is up no client will be
able to actually discover it. To address this issue we now have a listener that
listens for RECONNECT events and recreates the entry.
2. Zookeeper is eventual consistent when multiple clients are involved. In
practice we did not notice this issue but in the vagrant cluster due to
resource constrained it was pretty evident that updates created by leader
nimbuses were not observed by other nimbus host unless they waited for a few
second. Due to this topology submission can take upto 5 minutes which is super
bad user experience.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)