[ 
https://issues.apache.org/jira/browse/CURATOR-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16728824#comment-16728824
 ] 

ASF GitHub Bot commented on CURATOR-498:
----------------------------------------

GitHub user shayshim opened a pull request:

    https://github.com/apache/curator/pull/298

    CURATOR-498 LeaderLatch deletes leader and leaves it hung beside a second 
leader

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/shayshim/curator master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/curator/pull/298.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #298
    
----
commit bdf9ec7a8e89cd1b7c73c7387dc5c38306369ae3
Author: Shay Shimony <shayshim@...>
Date:   2018-12-25T22:53:03Z

    Create empty

----


> LeaderLatch deletes leader and leaves it hung beside a second leader
> --------------------------------------------------------------------
>
>                 Key: CURATOR-498
>                 URL: https://issues.apache.org/jira/browse/CURATOR-498
>             Project: Apache Curator
>          Issue Type: Bug
>    Affects Versions: 4.0.1, 4.1.0
>         Environment: ZooKeeper 3.4.13, Curator 4.1.0 (selecting explicitly 
> 3.4.13), Linux
>            Reporter: Shay Shimony
>            Assignee: Jordan Zimmerman
>            Priority: Major
>         Attachments: HaWatcher.log, LeaderLatch0.java
>
>
> The Curator app I am working on uses the LeaderLatch to select a leader out 
> of 6 clients.
> While testing my app, I noticed that when I make ZK lose its quorum for a 
> while and then restore it, then after my app restores it's connection to ZK - 
> sometimes not all the 6 clients are found in the latch path (using zkCli.sh). 
> That is, I have 5 instead of 6.
> After investigating a little, I have a suspicion that LeaderLatch deleted the 
> leader in method setNode.
> To investigate it I copied the LeaderLatch code and added some log messages, 
> and from them it seems like very old create() background callback was 
> surprisingly scheduled and corrupted the current leader with its stale path 
> name. Meaning, this old one called setNode with its stale name, and set 
> itself instead of the leader and deleted the leader. This leaves client 
> running, thinking it is the leader, while another leader is selected.
> If my analysis is correct then it seems like we need to make this obsolete 
> create callback cancelled.
> Please see attached log file and modified LeaderLatch0.
>  
> In the log, note that 0000000485 is replaced by 0000000480 and then probably 
> deleted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to