[
https://issues.apache.org/jira/browse/ZOOKEEPER-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807789#comment-13807789
]
Germán Blanco commented on ZOOKEEPER-1732:
------------------------------------------
I think that the problem might be in the epoch handling. When we used
"newEpoch", instead of "newEpoch-1" the result is that an old leader still
reports "newEpoch-1", but the updated followers report "newEpoch". As a result,
a new server that tries to join sees an inconsistency in the voting, even when
it is ignoring zxid and election epoch information.
> ZooKeeper server unable to join established ensemble
> ----------------------------------------------------
>
> Key: ZOOKEEPER-1732
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1732
> Project: ZooKeeper
> Issue Type: Bug
> Components: leaderElection
> Affects Versions: 3.4.5
> Environment: Windows 7, Java 1.7
> Reporter: Germán Blanco
> Assignee: Germán Blanco
> Priority: Blocker
> Fix For: 3.4.6, 3.5.0
>
> Attachments: CREATE_INCONSISTENCIES_patch.txt, zklog.tar.gz,
> ZOOKEEPER-1732-3.4.patch, ZOOKEEPER-1732-3.4.patch, ZOOKEEPER-1732-3.4.patch,
> ZOOKEEPER-1732-3.4.patch, ZOOKEEPER-1732-b3.4.patch,
> ZOOKEEPER-1732-b3.4.patch, ZOOKEEPER-1732.patch, ZOOKEEPER-1732.patch,
> ZOOKEEPER-1732.patch, ZOOKEEPER-1732.patch, ZOOKEEPER-1732.patch
>
>
> I have a test in which I do a rolling restart of three ZooKeeper servers and
> it was failing from time to time.
> I ran the tests in a loop until the failure came out and it seems that at
> some point one of the servers is unable to join the enssemble formed by the
> other two.
--
This message was sent by Atlassian JIRA
(v6.1#6144)