[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113090#comment-17113090
 ] 

Mate Szalay-Beko commented on ZOOKEEPER-3842:
---------------------------------------------

[~kaushik srinivas]
{quote}Please help us in giving little more clarity on this PR.
{quote}
feel free to comment the PR itself next time, that way our conversation will be 
visible for the reviewers too :)

 
{quote}Change 1 : As mentioned in ZOOKEEPER-3830, new node became leader and 
lastSeenQuorumVerifier does not contain the new node. So this is explicitly 
reset/updated with the new node if dynamic reconfig is disabled and new node is 
becoming the leader ?
{quote}
the {{lastSeenQuorumVerifier}} should contain the last config we saw during / 
after the last leader election. As far as I can remember, it is set in the 
Followers by the NEWLEADER message sent by the Leader. When dynamic-reconfig is 
disabled, then in theory your config should be static, so I think it is OK to 
reset {{lastSeenQuorumVerifier}} in the leader to the current config (what 
comes usually from zoo.cfg). At least this is the idea behind the change.

 
{quote}Change 2: if dynamic reconfig is enabled, then getDesignatedLeader hook 
is removed. Is this "getDesignatedLeader" code introduced purely from dynamic 
reconfig feature and is causing conflicts if reconfig is disabled and in turn 
causing all the nodes to set allowedToCommit = false ?
{quote}
yes, you see it right. In this part we handle a dynamic-reconfig edge-case when 
the currently elected Leader is actually not the leader anymore and we not 
allow him to commit anything. This is something never should happen when 
dynamic-reconfig is disabled.

I spent a lot of time, trying to figure out this part in ZooKeeper (I never 
touched the dynamic reconfig before). I am fairly confident that the patch will 
fix the issue, but I don't want to push the PR until some of the original 
developers of dynamic-reconfig can review it.

Kind regards,
 Mate

> Rolling scale up of zookeeper cluster does not work with reconfigEnabled=false
> ------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-3842
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3842
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum, server
>    Affects Versions: 3.5.7
>            Reporter: kaushik srinivas
>            Priority: Blocker
>              Labels: features
>
> With 
> *reconfigEnabled = false (not explicitly setting, relying on the default 
> value).*
>  
> Install 3 zookeeper servers with 3 zk information in all the 3 zookeeper 
> quorum servers.
>  
> Do a rolling scale up of cluster from 3 to 5 with below steps.
>  
> 1. Install 4th zookeeper with servers list of 1,2,3,4,5
> 2. Install 5th zookeeper with servers list of 1,2,3,4,5
> 3. Do a rolling restart of servers 1 2 & 3 with servers list of 1,2,3,4,5.
>  
> Result/Behavior: quorum is lost.
>  
> With this https://issues.apache.org/jira/browse/ZOOKEEPER-2819
> description points at a PR [https://github.com/apache/zookeeper/pull/292]
> which should have this issue of rolling restart fixed without dynamic 
> reconfiguration feature enabled.
>  
> We still see quorum loss issues without dynamic reconfig feature enabled.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to