Hi Camille, Thanks for the reply. Could you please go through the following cases:
Case-1:- Say, I have 5 servers zk1,zk2,zk3,zk4,zk5 and configured sessionTimeOut=60secs readTimeOut = 60 * 2 / 3 and is 40secs connectionTimeOut = 60/servers.length = 60/5 = 12secs step1: Say, client has established connection with zk1. step2: Shutdown zk1 and zk2. Since readTimeOut is 40s, will take 40s for first retrayal to the next server. step3: Say, client retry's to zk2, will take max 12s for connectionTimeOut. Now client session has elapsed total 52secs, only left out time for session expiration is 8secs. Retryal intervals as follows >> 40s, 12s, 8s Case-2:- Also consider 'R-O server' feature, started 5 servers with R-O mode and configured sessionTimeOut=60secs step1: Client has established connection with zk1. step2: Shutdown zk1,zk2,zk3 step3: Client has elapsed 40s for readTimeOut, then 12s, then 8s and the client session will be expired before the next retryal. But Zk4 and ZK5 are running in R-O mode and able to retain the client session. Say, if we consider 'servers.length' so can improve the retryals or shall we think of a better formula? (Note:- Evenafter considering server.length, I feel still there is a small gap, not retrying to the fifth server) readTimeOut = 60 * 2 / 5(servers.len) and is 24secs. Retryal intervals as follows >> 24s, 12s, 12s, 12s IMO, presently 'readTimeOut' is not calculated based on the quorum strength, but it would be good to have a shorter timeout for more fair retryals. Thanks, Rakesh ________________________________________ From: Camille Fournier [[email protected]] Sent: Tuesday, January 03, 2012 5:51 AM To: [email protected] Subject: Re: zookeeper client retry logic.. It's an interesting idea... can you explain more why you think it would be good to have a shorter timeout in the case of a longer list of servers? Thanks, C On Mon, Jan 2, 2012 at 2:08 AM, Rakesh R <[email protected]> wrote: > Hi everyone, > > > > In ClientCnxn, 'readTimeOut' is calculated as follows: > > readTimeOut = sessionTimeOut * 2 / 3; // here it is not considering the > server list. If the server list grows more than 3, it will not giving a fair > chance to retry to all the servers(in worst case). > > > > Can we think of changing the 'readTimeOut logic' by using the > serverslist.length instead of constant/magic number '3'. > > > > For example:- > > I have 5 servers and client sessionTimeOut=120secs > > > > readTimeOut = 120 * 2 / 3 and is 80secs > > > > In this case, the it takes 80secs for the first timeout if the connected > server is not responding. This is large time, if we consdier the serverlist, > it can retry to next server immediately in <50secs. > > > > > > Thanks & Regards, > > Rakesh > > > >
