[
https://issues.apache.org/jira/browse/ZOOKEEPER-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13867711#comment-13867711
]
Hudson commented on ZOOKEEPER-1057:
-----------------------------------
SUCCESS: Integrated in ZooKeeper-trunk #2181 (See
[https://builds.apache.org/job/ZooKeeper-trunk/2181/])
ZOOKEEPER-1057. zookeeper c-client, connection to offline server fails to
successfully fallback to second zk host (Germán Blanco via michim) (michim:
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1556948)
* /zookeeper/trunk/CHANGES.txt
* /zookeeper/trunk/src/c/tests/TestClient.cc
> zookeeper c-client, connection to offline server fails to successfully
> fallback to second zk host
> -------------------------------------------------------------------------------------------------
>
> Key: ZOOKEEPER-1057
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1057
> Project: ZooKeeper
> Issue Type: Bug
> Components: c client
> Affects Versions: 3.3.1, 3.3.2, 3.3.3
> Environment: snowdutyrise-lm ~/-> uname -a
> Darwin snowdutyrise-lm 9.8.0 Darwin Kernel Version 9.8.0: Wed Jul 15 16:55:01
> PDT 2009; root:xnu-1228.15.4~1/RELEASE_I386 i386
> also observed on:
> 2.6.35-28-server 49-Ubuntu SMP Tue Mar 1 14:55:37 UTC 2011
> Reporter: Woody Anderson
> Assignee: Michi Mutsuzaki
> Priority: Blocker
> Fix For: 3.4.6, 3.5.0
>
> Attachments: ZOOKEEPER-1057-b3.4.patch, ZOOKEEPER-1057.patch,
> ZOOKEEPER-1057.patch, ZOOKEEPER-1057.patch, ZOOKEEPER-1057.patch,
> ZOOKEEPER-1057.patch, ZOOKEEPER-1057.patch
>
>
> Hello, I'm a contributor for the node.js zookeeper module:
> https://github.com/yfinkelstein/node-zookeeper
> i'm using zk 3.3.3 for the purposes of this issue, but i have validated it
> fails on 3.3.1 and 3.3.2
> i'm having an issue when trying to connect when one of my zookeeper servers
> is offline.
> if the first server attempted is online, all is good.
> if the offline server is attempted first, then the client is never able to
> connect to _any_ server.
> inside zookeeper.c a connection loss (-4) is received, the socket is closed
> and buffers are cleaned up, it then attempts the next server in the list,
> creates a new socket (which gets the same fd as the previously closed socket)
> and connecting fails, and it continues to fail seemingly forever.
> The nature of this "fail" is not that it gets -4 connection loss errors, but
> that zookeeper_interest doesn't find anything going on on the socket before
> the user provided timeout kicks things out. I don't want to have to wait 5
> minutes, even if i could make myself.
> this is the message that follows the connection loss:
> 2011-04-27 23:18:28,355:13485:ZOO_ERROR@handle_socket_error_msg@1530: Socket
> [127.0.0.1:5020] zk retcode=-7, errno=60(Operation timed out): connection
> timed out (exceeded timeout by 3ms)
> 2011-04-27 23:18:28,355:13485:ZOO_ERROR@yield@213: yield:zookeeper_interest
> returned error: -7 - operation timeout
> While investigating, i decided to comment out close(zh->fd) in handle_error
> (zookeeper.c#1153)
> now everything works (obviously i'm leaking an fd). Connection the the second
> host works immediately.
> this is the behavior i'm looking for, though i clearly don't want to leak the
> fd, so i'm wondering why the fd re-use is causing this issue.
> close() is not returning an error (i checked even though current code assumes
> success).
> i'm on osx 10.6.7
> i tried adding a setsockopt so_linger (though i didn't want that to be a
> solution), it didn't work.
> full debug traces are included in issue here:
> https://github.com/yfinkelstein/node-zookeeper/issues/6
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)