[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15354624#comment-15354624
 ] 

Michael Han commented on ZOOKEEPER-2447:
----------------------------------------

I think this is an optimization and my question is does this optimization worth 
the effort :-) does introduce another timeout at a higher level of the stack 
help reduce the delay? I am curious to see if there is any product scenario / 
data that could back up this change, because it seems to me that the time 
savings would depend on the values of current connection timeout and the value 
this patch fed into isReachable. Also, as Edward pointed out, the timeout value 
should be configurable instead of hardcoded (and documentation needs updated as 
well.).

> Zookeeper adds  good delay when one of the quorum host is not reachable
> -----------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2447
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2447
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.4.6, 3.5.0
>            Reporter: Vishal Khandelwal
>            Assignee: Vishal Khandelwal
>             Fix For: 3.5.3, 3.6.0
>
>         Attachments: ZOOKEEPER-2447.3.5.patch, withfix.txt, withoutFix.txt
>
>
> StaticHostProvider --> resolveAndShuffle method adds all of the address which 
> are valid in the quorum to the list, shuffles them and sends back to client 
> connection class. If after shuffling if first node appear to be the one which 
> is not reachable, Clientcnx.SendThread.run will keep on connecting to the 
> failure till a timeout and the moves to a different node. This adds up random 
> delay in zookeeper connection in case a host is down. Rather we could check 
> if host is reachable in StaticHostProvider and ignore isReachable is false. 
> Same as we do for UnknownHostException Exception.
> This can tested using following test code by providing a valid host which is 
> not reachable. for quick test comment Collections.shuffle(tmpList, 
> sourceOfRandomness); in StaticHostProvider.resolveAndShuffle
> {code}
>  @Test
>   public void test() throws Exception {
>     EventsWatcher watcher = new EventsWatcher();
>     QuorumUtil qu = new QuorumUtil(1);
>     qu.startAll();
>     
>     ZooKeeper zk =
>         new ZooKeeper("<hostnamet:2181," + qu.getConnString(), 180 * 1000, 
> watcher);
>     
>     watcher.waitForConnected(CONNECTION_TIMEOUT * 5);
>     Assert.assertTrue("connection Established", watcher.isConnected());
>     zk.close();    
>   }
> {code}
> Following fix can be added to StaticHostProvider.resolveAndShuffle
> {code}
>  if(taddr.isReachable(4000 // can be some value)) {
>                       tmpList.add(new InetSocketAddress(taddr, 
> address.getPort()));
>                     } 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to