[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16649635#comment-16649635
 ] 

Jiafu Jiang commented on ZOOKEEPER-3099:
----------------------------------------

[~lvfangmin] thanks for your advice. I have changed the title and the 
description.

> ZooKeeper cluster is unavailable for session_timeout time due to network 
> partition in a three-node environment.   
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-3099
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3099
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: c client, java client
>    Affects Versions: 3.4.11, 3.5.4, 3.4.12, 3.4.13
>            Reporter: Jiafu Jiang
>            Priority: Major
>
>  
> The default readTimeout timeout of ZooKeeper client is 2/3 * session_time, 
> the default connectTimeout is session_time/hostProvider.size(). If the 
> ZooKeeper cluster has 3 nodes, then connectTimeout is 1/3 * session_time.
>  
> Supports we have three ZooKeeper servers: zk1, zk2, zk3 deployed. And zk3 is 
> now the leader. Client c1 is now connected to zk2(follower). Then we shutdown 
> the network of zk3(leader), the same time, client c1 begin to write some data 
> to ZooKeeper. After a (syncLimit * tick) timeout, zk2 will disconnect with 
> leader and begin a new election, and zk2 becomes the leader.
>  
> The write operation will not succeed due to the leader is unavailable. It 
> will take at most readTimeout time for c1 to discover the failure, and client 
> c1 will try to choose another ZooKeeper server. Unfortunately, c1 may choose 
> zk3, which is unreachable now, then c1 will spend connectTimeout to find out 
> that zk3 is unused. Notice that readTimeout + connectTimeout = 
> sesstion_timeout in my case(three-node cluster).
>  
> Therefore, in this case, the ZooKeeper cluster is unavailable for session 
> timeout time when only one ZooKeeper server is unreachable due to network 
> partition.
>  
> I have some suggestions:
>  # The HostProvider used by ZooKeeper can be specified by an argument.
>  # readTimeout can also be specified in any way.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to