[
https://issues.apache.org/jira/browse/ZOOKEEPER-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113090#comment-17113090
]
Mate Szalay-Beko commented on ZOOKEEPER-3842:
---------------------------------------------
[~kaushik srinivas]
{quote}Please help us in giving little more clarity on this PR.
{quote}
feel free to comment the PR itself next time, that way our conversation will be
visible for the reviewers too :)
{quote}Change 1 : As mentioned in ZOOKEEPER-3830, new node became leader and
lastSeenQuorumVerifier does not contain the new node. So this is explicitly
reset/updated with the new node if dynamic reconfig is disabled and new node is
becoming the leader ?
{quote}
the {{lastSeenQuorumVerifier}} should contain the last config we saw during /
after the last leader election. As far as I can remember, it is set in the
Followers by the NEWLEADER message sent by the Leader. When dynamic-reconfig is
disabled, then in theory your config should be static, so I think it is OK to
reset {{lastSeenQuorumVerifier}} in the leader to the current config (what
comes usually from zoo.cfg). At least this is the idea behind the change.
{quote}Change 2: if dynamic reconfig is enabled, then getDesignatedLeader hook
is removed. Is this "getDesignatedLeader" code introduced purely from dynamic
reconfig feature and is causing conflicts if reconfig is disabled and in turn
causing all the nodes to set allowedToCommit = false ?
{quote}
yes, you see it right. In this part we handle a dynamic-reconfig edge-case when
the currently elected Leader is actually not the leader anymore and we not
allow him to commit anything. This is something never should happen when
dynamic-reconfig is disabled.
I spent a lot of time, trying to figure out this part in ZooKeeper (I never
touched the dynamic reconfig before). I am fairly confident that the patch will
fix the issue, but I don't want to push the PR until some of the original
developers of dynamic-reconfig can review it.
Kind regards,
Mate
> Rolling scale up of zookeeper cluster does not work with reconfigEnabled=false
> ------------------------------------------------------------------------------
>
> Key: ZOOKEEPER-3842
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3842
> Project: ZooKeeper
> Issue Type: Bug
> Components: quorum, server
> Affects Versions: 3.5.7
> Reporter: kaushik srinivas
> Priority: Blocker
> Labels: features
>
> With
> *reconfigEnabled = false (not explicitly setting, relying on the default
> value).*
>
> Install 3 zookeeper servers with 3 zk information in all the 3 zookeeper
> quorum servers.
>
> Do a rolling scale up of cluster from 3 to 5 with below steps.
>
> 1. Install 4th zookeeper with servers list of 1,2,3,4,5
> 2. Install 5th zookeeper with servers list of 1,2,3,4,5
> 3. Do a rolling restart of servers 1 2 & 3 with servers list of 1,2,3,4,5.
>
> Result/Behavior: quorum is lost.
>
> With this https://issues.apache.org/jira/browse/ZOOKEEPER-2819
> description points at a PR [https://github.com/apache/zookeeper/pull/292]
> which should have this issue of rolling restart fixed without dynamic
> reconfiguration feature enabled.
>
> We still see quorum loss issues without dynamic reconfig feature enabled.
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)