Priyank Shah created STORM-1451:
-----------------------------------

             Summary: Storm topology submission can take upto 5 minutes in HA 
mode when zookeeper reconnects. Nimbus discovery can fail when zookeeper 
reconnect happens.
                 Key: STORM-1451
                 URL: https://issues.apache.org/jira/browse/STORM-1451
             Project: Apache Storm
          Issue Type: Bug
          Components: storm-core
            Reporter: Priyank Shah


We discovered couple of issues when testing storm under vagrant clusters.
1. When a nimbus zookeeper connection is dropped and reconnected the ephemeral 
entry for that host under /nimbuses gets deleted and is not auto recreated when 
reconnection happens. This means even though nimbus is up no client will be 
able to actually discover it. To address this issue we now have a listener that 
listens for RECONNECT events and recreates the entry.
2. Zookeeper is eventual consistent when multiple clients are involved. In 
practice we did not notice this issue but in the vagrant cluster due to 
resource constrained it was pretty evident that updates created by leader 
nimbuses were not observed by other nimbus host unless they waited for a few 
second. Due to this topology submission can take upto 5 minutes which is super 
bad user experience.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to