[
https://issues.apache.org/jira/browse/CURATOR-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16754581#comment-16754581
]
ASF GitHub Bot commented on CURATOR-498:
----------------------------------------
Github user cammckenzie commented on a diff in the pull request:
https://github.com/apache/curator/pull/303#discussion_r251685066
--- Diff:
curator-framework/src/main/java/org/apache/curator/framework/imps/CreateBuilderImpl.java
---
@@ -48,19 +50,21 @@
public class CreateBuilderImpl implements CreateBuilder, CreateBuilder2,
BackgroundOperation<PathAndBytes>, ErrorListenerPathAndBytesable<String>
{
+ private final Logger log = LoggerFactory.getLogger(getClass());
private final CuratorFrameworkImpl client;
private CreateMode createMode;
private Backgrounding backgrounding;
private boolean createParentsIfNeeded;
private boolean createParentsAsContainers;
- private boolean doProtected;
private boolean compress;
private boolean setDataIfExists;
private int setDataIfExistsVersion = -1;
- private String protectedId;
private ACLing acling;
private Stat storingStat;
private long ttl;
+ private boolean doProtected;
+ private String protectedId;
+ private long initialSessionId;
--- End diff --
protectedEphemeralSessionID? And then maybe move the initialisation check
so that it only occurs if Curator is actually creating a protected ephemeral
node, rather than always initialising on the first forPath() call?
> Protected Mode creation can mistake closing session's node causing problems
> for many recipes such as LeaderLatch
> ----------------------------------------------------------------------------------------------------------------
>
> Key: CURATOR-498
> URL: https://issues.apache.org/jira/browse/CURATOR-498
> Project: Apache Curator
> Issue Type: Bug
> Components: Framework
> Affects Versions: 4.0.1, 4.1.0
> Environment: ZooKeeper 3.4.13, Curator 4.1.0 (selecting explicitly
> 3.4.13), Linux
> Reporter: Shay Shimony
> Assignee: Jordan Zimmerman
> Priority: Blocker
> Fix For: 4.1.1
>
> Attachments: CURATOR-498.png, HaWatcher.log, LeaderLatch0.java,
> ha.tar.gz, logs.tar.gz, reproduction.tar.gz, reproduction2.tar.gz
>
>
> The Curator app I am working on uses the LeaderLatch to select a leader out
> of 6 clients.
> While testing my app, I noticed that when I make ZK lose its quorum for a
> while and then restore it, then after Curator in my app restores it's
> connection to ZK - sometimes not all the 6 clients are found in the latch
> path (using zkCli.sh). That is, I have 5 instead of 6.
> After investigating a little, I have a suspicion that LeaderLatch deleted the
> leader in method setNode.
> To investigate it I copied the LeaderLatch code and added some log messages,
> and from them it seems like very old create() background callback was
> surprisingly scheduled and corrupted the current leader with its stale path
> name. Meaning, this old one called setNode with its stale name, and set
> itself instead of the leader and deleted the leader. This leaves client
> running, thinking it is the leader, while another leader is selected.
> If my analysis is correct then it seems like we need to make this obsolete
> create callback cancelled (I think its session was suspended on 22:38:54 and
> then lost on 22:39:04 - so on SUSPENDED cancel ongoing callbacks).
> Please see attached log file and modified LeaderLatch0.
>
> In the log, note that on 22:39:26 it shows that 0000000485 is replaced by
> 0000000480 and then probably deleted.
> Note also that at 22:38:52, 34 seconds before, we can see that it was in the
> reset() method ("RESET OUR PATH") and possibly triggered the creation of
> 0000000480 then.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)