Satish Duggana created CURATOR-358:
--------------------------------------
Summary: Receiving KeeperException with NoNode when
LeaderLatch#getLeader()
Key: CURATOR-358
URL: https://issues.apache.org/jira/browse/CURATOR-358
Project: Apache Curator
Issue Type: Bug
Components: Recipes
Affects Versions: 2.10.0
Reporter: Satish Duggana
Priority: Critical
org.apache.curator.framework.recipes.leader.LeaderLatch#getLeader() throws
KeeperException with Code#NONODE intermittently as mentioned in the stack trace
below. It may be possible participant's ephemeral ZK node is removed because
its connection/session is closed.
You can see the below code at
https://github.com/apache/curator/blob/master/curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java#L451
public Participant getLeader() throws Exception
{
Collection<String> participantNodes =
LockInternals.getParticipantNodes(client, latchPath, LOCK_NAME, sorter);
return LeaderSelector.getLeader(client, participantNodes);
}
I guess it hits a race condition where a participant node is retrieved but when
it invokes LeaderSelector#getLeader() it would have been removed because of
session timeout and it throws KeeperException with NoNode code. It does not
retry as the RetryLoop retries only for connection/session timeouts. But in
this case, NoNode should have been retried. I could not find any APIs on
CuratorClient to configure the kind of KeeperException codes to be retried. It
may be good to have a way to take what kind of errors should be retried in
org.apache.curator.framework.CuratorFrameworkFactory.Builder APIs.
Intermittent Exception found with the stack trace:
2016-11-15 06:09:33.954 o.a.s.d.nimbus [ERROR] Error when processing event
org.apache.storm.shade.org.apache.zookeeper.KeeperException$NoNodeException:
KeeperErrorCode = NoNode for
/storm/leader-lock/_c_97c09eed-5bba-4ac8-a05f-abdc4e8e95cf-latch-0000000002
at
org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
at
org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at
org.apache.storm.shade.org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
at
org.apache.storm.shade.org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:304)
at
org.apache.storm.shade.org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:293)
at
org.apache.storm.shade.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108)
at
org.apache.storm.shade.org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:290)
at
org.apache.storm.shade.org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:281)
at
org.apache.storm.shade.org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:42)
at
org.apache.storm.shade.org.apache.curator.framework.recipes.leader.LeaderSelector.participantForPath(LeaderSelector.java:375)
at
org.apache.storm.shade.org.apache.curator.framework.recipes.leader.LeaderSelector.getLeader(LeaderSelector.java:346)
at
org.apache.storm.shade.org.apache.curator.framework.recipes.leader.LeaderLatch.getLeader(LeaderLatch.java:454)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)