[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15321334#comment-15321334
 ] 

Martin Kuchta commented on ZOOKEEPER-2355:
------------------------------------------

I've been playing with some variations on the proposed fix and trying to reason 
about what's actually going wrong. When syncing with the leader, of the three 
leader responses (DIFF, SNAP, TRUNC), I think there's only an issue with 
setting the last processed ZXID the way it's currently done in the DIFF case. 
In the SNAP and TRUNC cases, we've already deserialized the snapshot or 
truncated the log by the time setLastProcessedZxid is called. In the DIFF case, 
the reason it's incorrect is because we're setting the last processed ZXID as 
if we've already committed all the transactions we're about to receive, so a 
failure before that actually happens leaves us in an inconsistent state.

The logic in the patch of moving the call to setLastProcessedZxid to when the 
follower receives UPTODATE or NEWLEADER makes sense to me, but this isn't 
consistent with the behavior expected by some of the other unit tests.

I don't think setLastProcessedZxid needs to be explicitly called at all when 
the follower receives a DIFF message because we will update the last processed 
ZXID as we commit transactions received from the leader anyway. I do think it 
needs to be preserved as-is for SNAP and TRUNC to keep the currently expected 
behavior. Whether there are other problematic scenarios associated with how 
SNAP and TRUNC are processed can be investigated separately since there may 
still be cases where the last processed ZXID and the actual transaction log 
state are out of sync.

I'm submitting a modified version of the patch provided by [~arshad.mohammad]. 
The patch includes his original unit test which still fails against trunk and 
passes with the patch, but the changes to Learner.java are the slightly 
different ones that I'm proposing.

(I do have two unit tests failing locally that are also failing against trunk, 
so I think it's an unrelated issue with my environment that I'll need to look 
into when I get time. If that turns out to not be the case based on the Jenkins 
build, I'll investigate.)

> Ephemeral node is never deleted if follower fails while reading the proposal 
> packet
> -----------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2355
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2355
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum, server
>            Reporter: Arshad Mohammad
>            Assignee: Arshad Mohammad
>            Priority: Critical
>             Fix For: 3.4.9
>
>         Attachments: ZOOKEEPER-2355-01.patch, ZOOKEEPER-2355-02.patch
>
>
> ZooKeeper ephemeral node is never deleted if follower fail while reading the 
> proposal packet
> The scenario is as follows:
> # Configure three node ZooKeeper cluster, lets say nodes are A, B and C, 
> start all, assume A is leader, B and C are follower
> # Connect to any of the server and create ephemeral node /e1
> # Close the session, ephemeral node /e1 will go for deletion
> # While receiving delete proposal make Follower B to fail with 
> {{SocketTimeoutException}}. This we need to do to reproduce the scenario 
> otherwise in production environment it happens because of network fault.
> # Remove the fault, just check that faulted Follower is now connected with 
> quorum
> # Connect to any of the server, create the same ephemeral node /e1, created 
> is success.
> # Close the session,  ephemeral node /e1 will go for deletion
> # {color:red}/e1 is not deleted from the faulted Follower B, It should have 
> been deleted as it was again created with another session{color}
> # {color:green}/e1 is deleted from Leader A and other Follower C{color}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to