Hi Dan, The Apache JIRA is undergoing some issues (see here https://twitter.com/infrabot ) , but looks like it has come back.
Would you mind to repost this message there? :) Cheers, Ed On Wed, Jun 29, 2016 at 5:46 PM, Dan Benediktson < [email protected]> wrote: > Apologies for sending this on the mail list instead of through a comment on > the JIRA - I can't seem to get https://issues.apache.org/jira/ to load for > me. > > Given that the goal is to open a socket, implementing an isReachable() > method which is just opening a socket with a timeout seems wasteful. It > will add unnecessary additional latency (the RTT latency for TCP handshake) > and overhead to every connection, even successful ones where the ensemble > isn't suffering due to a down host; it'll also put a little extra load on > the servers, not just clients. Would it make sense to make > ClientCnxnSocket.open() implementations apply the timeout directly? > > As far as choosing the connection timeout to use, I would suggest just > using the time-sliced session timeout (I thought this was already done, but > perhaps it's only applied later in the connection handshake?) but applying > some reasonable lower bound (say, default of 1 second and overridable by a > Java system property). We actually have a fork of the ZK code which has > that minimum bound for connection timeout logic in the Java client, and I'd > be happy to prepare a patch for that part. It's pretty trivial code, but > it's important if you start using a combination of large ZK clusters and > very small session timeouts, which we have in some of our ensembles. > > Thanks, > Dan > > On Wed, Jun 29, 2016 at 12:27 PM, Andrew Purtell (JIRA) <[email protected]> > wrote: > > > > > [ > > > https://issues.apache.org/jira/browse/ZOOKEEPER-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15355708#comment-15355708 > > ] > > > > Andrew Purtell commented on ZOOKEEPER-2447: > > ------------------------------------------- > > > > bq. An alternative would be to roll out our own equivalent of > > {{InetAddress#isReachable()}} as I scribbled below? > > Makes sense. > > > > > Zookeeper adds good delay when one of the quorum host is not reachable > > > ----------------------------------------------------------------------- > > > > > > Key: ZOOKEEPER-2447 > > > URL: > > https://issues.apache.org/jira/browse/ZOOKEEPER-2447 > > > Project: ZooKeeper > > > Issue Type: Bug > > > Affects Versions: 3.4.6, 3.5.0 > > > Reporter: Vishal Khandelwal > > > Assignee: Vishal Khandelwal > > > Fix For: 3.5.3, 3.6.0 > > > > > > Attachments: ZOOKEEPER-2447.3.5.patch, withfix.txt, > > withoutFix.txt > > > > > > > > > StaticHostProvider --> resolveAndShuffle method adds all of the address > > which are valid in the quorum to the list, shuffles them and sends back > to > > client connection class. If after shuffling if first node appear to be > the > > one which is not reachable, Clientcnx.SendThread.run will keep on > > connecting to the failure till a timeout and the moves to a different > node. > > This adds up random delay in zookeeper connection in case a host is down. > > Rather we could check if host is reachable in StaticHostProvider and > ignore > > isReachable is false. Same as we do for UnknownHostException Exception. > > > This can tested using following test code by providing a valid host > > which is not reachable. for quick test comment > Collections.shuffle(tmpList, > > sourceOfRandomness); in StaticHostProvider.resolveAndShuffle > > > {code} > > > @Test > > > public void test() throws Exception { > > > EventsWatcher watcher = new EventsWatcher(); > > > QuorumUtil qu = new QuorumUtil(1); > > > qu.startAll(); > > > > > > ZooKeeper zk = > > > new ZooKeeper("<hostnamet:2181," + qu.getConnString(), 180 * > > 1000, watcher); > > > > > > watcher.waitForConnected(CONNECTION_TIMEOUT * 5); > > > Assert.assertTrue("connection Established", watcher.isConnected()); > > > zk.close(); > > > } > > > {code} > > > Following fix can be added to StaticHostProvider.resolveAndShuffle > > > {code} > > > if(taddr.isReachable(4000 // can be some value)) { > > > tmpList.add(new InetSocketAddress(taddr, > > address.getPort())); > > > } > > > {code} > > > > > > > > -- > > This message was sent by Atlassian JIRA > > (v6.3.4#6332) > > >
