[
https://issues.apache.org/jira/browse/CURATOR-358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15682510#comment-15682510
]
ASF GitHub Bot commented on CURATOR-358:
----------------------------------------
Github user Randgalt commented on a diff in the pull request:
https://github.com/apache/curator/pull/173#discussion_r88829307
--- Diff:
curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderSelector.java
---
@@ -341,11 +342,41 @@ public Participant getLeader() throws Exception
static Participant getLeader(CuratorFramework client,
Collection<String> participantNodes) throws Exception
{
+ Participant result = null;
+
if ( participantNodes.size() > 0 )
{
- return participantForPath(client,
participantNodes.iterator().next(), true);
+ Iterator<String> iter = participantNodes.iterator();
+ while ( iter.hasNext() )
+ {
+
+ try
+ {
+ result = participantForPath(client, iter.next(), true);
--- End diff --
Why not have `participantForPath()` handle the exception and return `null`?
> Receiving KeeperException with NoNode when LeaderLatch#getLeader()
> ------------------------------------------------------------------
>
> Key: CURATOR-358
> URL: https://issues.apache.org/jira/browse/CURATOR-358
> Project: Apache Curator
> Issue Type: Bug
> Components: Recipes
> Affects Versions: 2.10.0
> Reporter: Satish Duggana
> Priority: Critical
>
> org.apache.curator.framework.recipes.leader.LeaderLatch#getLeader() throws
> KeeperException with Code#NONODE intermittently as mentioned in the stack
> trace below. It may be possible participant's ephemeral ZK node is removed
> because its connection/session is closed.
> You can see the below code at
> https://github.com/apache/curator/blob/master/curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java#L451
> public Participant getLeader() throws Exception
> {
> Collection<String> participantNodes =
> LockInternals.getParticipantNodes(client, latchPath, LOCK_NAME, sorter);
> return LeaderSelector.getLeader(client, participantNodes);
> }
> I guess it hits a race condition where a participant node is retrieved but
> when it invokes LeaderSelector#getLeader() it would have been removed because
> of session timeout and it throws KeeperException with NoNode code. It does
> not retry as the RetryLoop retries only for connection/session timeouts. But
> in this case, NoNode should have been retried. I could not find any APIs on
> CuratorClient to configure the kind of KeeperException codes to be retried.
> It may be good to have a way to take what kind of errors should be retried in
> org.apache.curator.framework.CuratorFrameworkFactory.Builder APIs.
> Intermittent Exception found with the stack trace:
> 2016-11-15 06:09:33.954 o.a.s.d.nimbus [ERROR] Error when processing event
> org.apache.storm.shade.org.apache.zookeeper.KeeperException$NoNodeException:
> KeeperErrorCode = NoNode for
> /storm/leader-lock/_c_97c09eed-5bba-4ac8-a05f-abdc4e8e95cf-latch-0000000002
> at
> org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
> at
> org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at
> org.apache.storm.shade.org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
> at
> org.apache.storm.shade.org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:304)
> at
> org.apache.storm.shade.org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:293)
> at
> org.apache.storm.shade.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108)
> at
> org.apache.storm.shade.org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:290)
> at
> org.apache.storm.shade.org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:281)
> at
> org.apache.storm.shade.org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:42)
> at
> org.apache.storm.shade.org.apache.curator.framework.recipes.leader.LeaderSelector.participantForPath(LeaderSelector.java:375)
> at
> org.apache.storm.shade.org.apache.curator.framework.recipes.leader.LeaderSelector.getLeader(LeaderSelector.java:346)
> at
> org.apache.storm.shade.org.apache.curator.framework.recipes.leader.LeaderLatch.getLeader(LeaderLatch.java:454)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)