[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17185924#comment-17185924
 ] 

Andre Price commented on ZOOKEEPER-3920:
----------------------------------------

I believe the case [2] will not have the issue because the ip/address that a 
server is locally configured for will match what is stored in the ensemble's 
"QuorumVerifier" state (i.e. for server 1 it will believe it is 172.17.0.2 
which is equal to 172.17.0.2 in the shared config).

I'm afraid I might be too unfamiliar with zookeeper internals to directly help 
fix (i.e. own the PR) – this is the first time i've looked at the code at all, 
but I'll try to help out where I can in the discussion and maybe relaying what 
my tests have shown.

For what it is worth i think there are a few things going on and i'm not sure 
what the right "fix" is given the interaction with reconfiguration:
 * I think if reconfiguration is disabled there should be no usage of the 
dynamic configuration (i.e. the local, static config should be used for the 
QuorumVerifier not the dynamic/ensemble stored config which appears to be 
happening). I am actually curious if it is possible to even modify it with 
rolling deployments at this point (will try to test). I understand that 
reconfiguration/dynamic config was the only mode of operation and then made 
optional. Perhaps the static config path isn't as well supported any more.
 * Thinking through the handling of 0.0.0.0 as a server address. I think many 
configurations running in containers opt to use this as the server address 
because it simplifies the binding. Otherwise containers may need to run in the 
host network mode to successfully bind the publicly accessible IP of the 
container (the docker host's). Almost feels like having a separate bind address 
config that is independent of the server config may make sense. In our case I 
_think_ we can get 3.6.1 working if we configure something similar to your  
[2]config but use the docker host's ips + requiring the docker containers to 
use the host network, but we can only do that if we had that config from the 
start. (Did you try moving from  [1]->[2]?)
 * Fixes in a new version aside, for folks that have a cluster that has gotten 
into this state what is the proper way to "clean" – we have downgraded a 
cluster to restore functionality but the /zookeeper/config znode remains. So 
I'm not sure if upgrading again will pick this up and use it (gonna try to test 
that also).

 

> Zookeeper clients timeout after leader change
> ---------------------------------------------
>
>                 Key: ZOOKEEPER-3920
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3920
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum, server
>    Affects Versions: 3.6.1
>            Reporter: Andre Price
>            Priority: Major
>         Attachments: stack.yml, zk_repro.zip
>
>
> [Sorry I believe this is a dupe of 
> https://issues.apache.org/jira/browse/ZOOKEEPER-3828 and potentially 
> https://issues.apache.org/jira/browse/ZOOKEEPER-3466 
> But i am not able to attach files there for some reason so creating a new 
> issue which hopefully allows me]
> We are encountering an issue where failing over from the leader results in 
> zookeeper clients not being able to connect successfully. They timeout 
> waiting for a response from the server. We are attempting to upgrade some 
> existing zookeeper clusters from 3.4.14 to 3.6.1 (not sure if relevant but 
> stating incase it helps with pinpointing issue) which is effectively blocked 
> by this issue. We perform the rolling upgrade (followers first then leader 
> last) and it seems to go successfully by all indicators. But we end up in the 
> state described in this issue where if the leader changes (either due to 
> restart or stopping) the cluster does not seem able to start new sessions.
> I've gathered some TRACE logs from our servers and will attach in the hopes 
> they can help figure this out. 
> Attached zk_repro.zip which contains the following:
>  * zoo.cfg used in one of the instances (they are all the same except for the 
> local server's ip being 0.0.0.0 in each)
>  * zoo.cfg.dynamic.next (don't think this is used anywhere but is written by 
> zookeeper at some point - I think when the first 3.6.1 container becomes 
> leader based on the value – the file is in all containers and is the same in 
> all as well)
>  * s\{1,2,3}_zk.log - logs from each of the 3 servers. Estimated time of 
> repro start indicated by "// REPRO START" text and whitespace in logs
>  * repro_steps.txt - rough steps executed that result in the server logs 
> attached
>  
> I'll summarize the repro here also:
>  # Initially it appears to be a healthy 3 node ensemble all running 3.6.1. 
> Server ids are 1,2,3 and 3 is the leader. Dynamic config/reconfiguration is 
> disabled.
>  # invoke srvr on each node (to verify setup and also create bookmark in logs)
>  # Do a zkCli get of /zookeeper/quota  which succeeds
>  # Do a restart of the leader (to same image/config) (server 2 now becomes 
> leader, 3 is back as follower)
>  # Try to perform the same zkCli get which times out (this get is done within 
> the container)
>  # Try to perform the same zkCli get but from another machine, this also 
> times out
>  # Invoke srvr on each node again (to verify that 2 is now the 
> leader/bookmark)
>  # Do a restart of server 2 (3 becomes leader, 2 follower)
>  # Do a zkCli get of /zookeeper/quota which succeeds
>  # Invoke srvr on each node again (to verify that 3 is leader)
> I tried to keep the other ZK traffic to a minimum but there are likely some 
> periodic mntr requests mixed from our metrics scraper.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to