[jira] [Commented] (ZOOKEEPER-3920) Zookeeper clients timeout after leader change

Andre Price (Jira) Wed, 02 Sep 2020 14:32:12 -0700


    [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17189721#comment-17189721
 ]


Andre Price commented on ZOOKEEPER-3920:
----------------------------------------

[~maoling] The advantage of using 0.0.0.0 is, as you say, to simplify and 
enable easier binding (the client port binding already does this by default). 
The last case you have works only because all the containers are running on the 
same machine, in the same bridge network. A more realistic configuration would 
be 3 machines as follows:
 * Server 1, Server IP: 10.0.0.2, Zookeeper container IP: 172.17.0.2 (from 
bridge network)
 * Server 2, Server IP: 10.0.0.3, Zookeeper container IP: 172.17.0.2 (from 
bridge network)
 * Server 2, Server IP: 10.0.0.4, Zookeeper container IP: 172.17.0.2 (from 
bridge network)

In this case you will need to use a zookeeper config like this since the 
servers can only be reached via the server IP not the container IP.
server.1=10.0.0.2:2888:3888
server.2=10.0.0.3:2888:3888
server.3=10.0.0.4:2888:3888
However, the binding will fail because the service cannot bind to that address 
when in bridge. Using 0.0.0.0 would succeed binding and be accessible by other 
hosts but will fail due to the configuration mismatch in the shared quorum 
config. 

You can use host network for the containers but that seems more of a 
workaround. There are various reasons folks may not want to run the container 
using the host network configuration. 

For example, other services like kafka have configuration that separates the 
"advertised" address (in this case it would be the server public IP (10.0...) ) 
and the binding address (0.0.0.0). I'm not sure if zookeeper would benefit (or 
maybe already has) a way to distinguish between this. The multi address support 
may complicate this. 

> Zookeeper clients timeout after leader change
> ---------------------------------------------
>
>                 Key: ZOOKEEPER-3920
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3920
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum, server
>    Affects Versions: 3.6.1
>            Reporter: Andre Price
>            Priority: Major
>         Attachments: stack.yml, zk_repro.zip
>
>
> [Sorry I believe this is a dupe of 
> https://issues.apache.org/jira/browse/ZOOKEEPER-3828 and potentially 
> https://issues.apache.org/jira/browse/ZOOKEEPER-3466 
> But i am not able to attach files there for some reason so creating a new 
> issue which hopefully allows me]
> We are encountering an issue where failing over from the leader results in 
> zookeeper clients not being able to connect successfully. They timeout 
> waiting for a response from the server. We are attempting to upgrade some 
> existing zookeeper clusters from 3.4.14 to 3.6.1 (not sure if relevant but 
> stating incase it helps with pinpointing issue) which is effectively blocked 
> by this issue. We perform the rolling upgrade (followers first then leader 
> last) and it seems to go successfully by all indicators. But we end up in the 
> state described in this issue where if the leader changes (either due to 
> restart or stopping) the cluster does not seem able to start new sessions.
> I've gathered some TRACE logs from our servers and will attach in the hopes 
> they can help figure this out. 
> Attached zk_repro.zip which contains the following:
>  * zoo.cfg used in one of the instances (they are all the same except for the 
> local server's ip being 0.0.0.0 in each)
>  * zoo.cfg.dynamic.next (don't think this is used anywhere but is written by 
> zookeeper at some point - I think when the first 3.6.1 container becomes 
> leader based on the value – the file is in all containers and is the same in 
> all as well)
>  * s\{1,2,3}_zk.log - logs from each of the 3 servers. Estimated time of 
> repro start indicated by "// REPRO START" text and whitespace in logs
>  * repro_steps.txt - rough steps executed that result in the server logs 
> attached
>  
> I'll summarize the repro here also:
>  # Initially it appears to be a healthy 3 node ensemble all running 3.6.1. 
> Server ids are 1,2,3 and 3 is the leader. Dynamic config/reconfiguration is 
> disabled.
>  # invoke srvr on each node (to verify setup and also create bookmark in logs)
>  # Do a zkCli get of /zookeeper/quota  which succeeds
>  # Do a restart of the leader (to same image/config) (server 2 now becomes 
> leader, 3 is back as follower)
>  # Try to perform the same zkCli get which times out (this get is done within 
> the container)
>  # Try to perform the same zkCli get but from another machine, this also 
> times out
>  # Invoke srvr on each node again (to verify that 2 is now the 
> leader/bookmark)
>  # Do a restart of server 2 (3 becomes leader, 2 follower)
>  # Do a zkCli get of /zookeeper/quota which succeeds
>  # Invoke srvr on each node again (to verify that 3 is leader)
> I tried to keep the other ZK traffic to a minimum but there are likely some 
> periodic mntr requests mixed from our metrics scraper.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ZOOKEEPER-3920) Zookeeper clients timeout after leader change

Reply via email to