[ 
https://issues.apache.org/jira/browse/CURATOR-724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17901058#comment-17901058
 ] 

Kuradeon edited comment on CURATOR-724 at 11/26/24 2:22 AM:
------------------------------------------------------------

[~tison] It's the parent of the contender nodes. After zk cluster was totally 
down, and the data cannot be recovered, this parent node may be missing.

In additional, getChildren tries to reset the contender node via the callback. 
But in that condition, the callback wouldn't be triggered.  
!image-2024-11-26-10-17-08-859.png!

So that I add a ConnectionStateListener for workaround, then this issue is 
fixed.
{code:java}
curatorClient.getConnectionStateListenable().addListener((client, newState) -> {
    if (newState.isConnected()) {
        try {
            if (client.checkExists().forPath(leaderPath) == null) {
                client.create()
                        .creatingParentContainersIfNeeded()
                        .forPath(leaderPath);
            }
        } catch (Exception e) {
            log.error("Failed to create leader path {}!", leaderPath, e);
        }
    }
});
LeaderLatch leaderLatch = new LeaderLatch(
        curatorClient, leaderPath, nodeId); {code}


was (Author: JIRAUSER305948):
[~tison] It's the parent of the contender nodes. After zk cluster was totally 
down, and the data cannot be recovered, this parent node may be missing.

In additional, getChildren tries to reset the contender node via the callback. 
But in that condition, the callback wouldn't be triggered. 
!image-2024-11-26-10-17-08-859.png!

> LeaderLatch isn't able to recover after zk recover/leaderPath missing.
> ----------------------------------------------------------------------
>
>                 Key: CURATOR-724
>                 URL: https://issues.apache.org/jira/browse/CURATOR-724
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Recipes
>    Affects Versions: 5.5.0, 5.6.0, 5.7.0, 5.7.1
>            Reporter: Kuradeon
>            Priority: Major
>         Attachments: image-2024-11-26-10-17-08-859.png
>
>
> zk server: 3.7.2
> After [https://github.com/apache/curator/pull/430], after the zk was down and 
> recovered, the LeaderLatch called getChildren() instead of reset() to recover 
> the leader election. But getChildren() triggers setNode() via callback 
> client.getChildren().inBackground(callback).forPath(ZKPaths.makePath(latchPath,
>  null)). If the leaderPath node doesn't exist after zk recovered, then the 
> LeaderLatch node wouldn't never recreated.
> There is a temporary solution, adding a ConnectionStateListener before the 
> LeaderLatch internal ConnectionStateListener, to create the leaderPath node 
> manually. Then this issue will be fixed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to