[ https://issues.apache.org/jira/browse/CURATOR-696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845147#comment-17845147 ]
Gian Merlino commented on CURATOR-696: -------------------------------------- I think we see this same sequence of actions, leading to two active leaders, in Apache Druid since updating to Curator 5.4. Details are here about what we saw: https://github.com/apache/druid/issues/16411#issuecomment-2103564632 A theory is that the change in CURATOR-644 from {{reset()}} to {{getChildren()}} on reconnection leads to a situation where the server does not realize that its znode no longer exists. The theory is that latch recipe sees an ephemeral node with the expected name, but it's from a previous session, and it goes away when the previous session expires. Perhaps a fix could be to check that the session of the old znode matches the current session, not just the name. > Double leader for LeaderLatch > ----------------------------- > > Key: CURATOR-696 > URL: https://issues.apache.org/jira/browse/CURATOR-696 > Project: Apache Curator > Issue Type: Bug > Affects Versions: 5.4.0, 5.5.0 > Reporter: lurna > Assignee: Enrico Olivelli > Priority: Critical > > When I use the LeaderLatch to select leader, there is a double-leader > phenomenon. > The timeline is as follows: > 1.A client connected and set its leader status to true > 2.zk offline until the session with the A client expires > 3.zk online,A client Reconnected and set its leader status to true with old > path > 4.zk delete old path(A client)because of expires > 5.A client cannot perceive that its node has been deleted,continues to > believe that it is the leader > 6.B client connected,due to zk's node being empty, set its leader status to > true > 7.now A client and B client are the leader at the same time > > It seems that due to CURATOR-644 and CURATOR-645 -- This message was sent by Atlassian Jira (v8.20.10#820010)