[jira] [Commented] (CURATOR-696) Double leader for LeaderLatch

Gian Merlino (Jira) Thu, 09 May 2024 15:53:04 -0700


    [ 
https://issues.apache.org/jira/browse/CURATOR-696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845147#comment-17845147
 ]


Gian Merlino commented on CURATOR-696:
--------------------------------------

I think we see this same sequence of actions, leading to two active leaders, in 
Apache Druid since updating to Curator 5.4. Details are here about what we saw: 
https://github.com/apache/druid/issues/16411#issuecomment-2103564632

A theory is that the change in CURATOR-644 from {{reset()}} to 
{{getChildren()}} on reconnection leads to a situation where the server does 
not realize that its znode no longer exists. The theory is that latch recipe 
sees an ephemeral node with the expected name, but it's from a previous 
session, and it goes away when the previous session expires. Perhaps a fix 
could be to check that the session of the old znode matches the current 
session, not just the name.

> Double leader for LeaderLatch
> -----------------------------
>
>                 Key: CURATOR-696
>                 URL: https://issues.apache.org/jira/browse/CURATOR-696
>             Project: Apache Curator
>          Issue Type: Bug
>    Affects Versions: 5.4.0, 5.5.0
>            Reporter: lurna
>            Assignee: Enrico Olivelli
>            Priority: Critical
>
> When I use the LeaderLatch to select leader, there is a double-leader 
> phenomenon.
> The timeline is as follows：
> 1.A client connected and set its leader status to true
> 2.zk offline until the session with the A client expires
> 3.zk online，A client Reconnected and set its leader status to true with old 
> path
> 4.zk delete old path（A client）because of expires
> 5.A client cannot perceive that its node has been deleted，continues to 
> believe that it is the leader
> 6.B client connected，due to zk's node being empty, set its leader status to 
> true
> 7.now A client and B client are the leader at the same time
>  
> It seems that due to CURATOR-644 and CURATOR-645



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CURATOR-696) Double leader for LeaderLatch

Reply via email to