[jira] [Commented] (ZOOKEEPER-1777) Missing ephemeral nodes in one of the members of the ensemble

JIRA Wed, 02 Oct 2013 20:40:02 -0700

    [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13784784#comment-13784784
 ]


Germán Blanco commented on ZOOKEEPER-1777:
------------------------------------------

I bet it is even more confusing for me :-). But I do have the logs of how I 
reproduced the problem, so I will upload that. It was anyway easy to reproduce, 
I just followed the steps above with an ensemble of three and whatever 
transactions. 
Forcing the snapshot in every synchronisation is not the only solution. It can 
also be solved with a check to verify that the followers have the same history 
as the leader. Since synchronisation is the only time in which a different 
history could be joining in, a check of the last transaction should be enough. 
The check could be done comparing the entire transaction information or with a 
checksum. This information (transaction info or checksum) could be sent from 
the follower to the leader anytime before the decision of whether to 
synchronise using DIFF, TRUNC or SNAP, and the leader could then send an SNAP 
if the checksum was wrong (and log a big WARN message).
This also covers the problem of an operator wrongly starting one of the members 
of the ensemble with a data dir coming from another ensemble.
However, this does mean a small change in the protocol, which can be done 
keeping backwards compatibility. The leader reports that it is able to 
optionally receive this information, and the follower sends that information 
only if the leader supports it.

> Missing ephemeral nodes in one of the members of the ensemble
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1777
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1777
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.4.5
>         Environment: Linux, Java 1.7
>            Reporter: Germán Blanco
>            Assignee: Germán Blanco
>            Priority: Blocker
>             Fix For: 3.4.6, 3.5.0
>
>         Attachments: snaps.tar
>
>
> In a 3-servers ensemble, one of the followers doesn't see part of the 
> ephemeral nodes that are present in the leader and the other follower. 
> The 8 missing nodes in "the follower that is not ok" were created in the end 
> of epoch 1, the ensemble is running in epoch 2.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ZOOKEEPER-1777) Missing ephemeral nodes in one of the members of the ensemble

Reply via email to