Robert Joseph Evans created STORM-2513:
------------------------------------------

             Summary: NPE possible in getLeader call
                 Key: STORM-2513
                 URL: https://issues.apache.org/jira/browse/STORM-2513
             Project: Apache Storm
          Issue Type: Bug
          Components: storm-core
    Affects Versions: 1.0.0, 2.0.0, 1.1.0
            Reporter: Robert Joseph Evans


The getLeader call actually reads data from two different locations

https://github.com/apache/storm/blob/v1.1.0/storm-core/src/clj/org/apache/storm/daemon/nimbus.clj#L2371-L2385

One is /leader-lock and the other is /nimbuses.  There is a really rare 
possibility that these two can get out of sync when the leader crashes and we 
read from leader election saying it is still the leader, but after that it's 
entry is removed from ZK for /nimbuses.  So we either need to make them not be 
separate entries, or we need to add in some kind of a retry when this happens.

Also NimbusClient has not retry built in.  Not all operations are idempotent, 
but we really should look at adding a retry with possibly switching to a new 
nimbus on idempotent operations.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to