Robert Joseph Evans created STORM-2513:
------------------------------------------
Summary: NPE possible in getLeader call
Key: STORM-2513
URL: https://issues.apache.org/jira/browse/STORM-2513
Project: Apache Storm
Issue Type: Bug
Components: storm-core
Affects Versions: 1.0.0, 2.0.0, 1.1.0
Reporter: Robert Joseph Evans
The getLeader call actually reads data from two different locations
https://github.com/apache/storm/blob/v1.1.0/storm-core/src/clj/org/apache/storm/daemon/nimbus.clj#L2371-L2385
One is /leader-lock and the other is /nimbuses. There is a really rare
possibility that these two can get out of sync when the leader crashes and we
read from leader election saying it is still the leader, but after that it's
entry is removed from ZK for /nimbuses. So we either need to make them not be
separate entries, or we need to add in some kind of a retry when this happens.
Also NimbusClient has not retry built in. Not all operations are idempotent,
but we really should look at adding a retry with possibly switching to a new
nimbus on idempotent operations.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)