[
https://issues.apache.org/jira/browse/ZOOKEEPER-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13139360#comment-13139360
]
Patrick Hunt edited comment on ZOOKEEPER-1271 at 10/29/11 4:46 PM:
-------------------------------------------------------------------
The error handling added to ZOOKEEPER-1174 is causing this bug.
{noformat}
try {
sockKey = sock.register(selector, SelectionKey.OP_CONNECT);
boolean immediateConnect = sock.connect(addr);
if (immediateConnect) {
sendThread.primeConnection();
}
} catch (IOException e) {
LOG.error("Unable to open socket to " + addr);
sock.close();
}
{noformat}
if an exception is thrown inside the try the socket is closed, however sockKey
is left set. As a result he client will not attempt to reconnect to the server
(typically it will continue to retry every second or so). I think the bug here
is that the exception should be rethrown, otw the 'cleanup' routine in
SendThread.run will not be executed.
was (Author: phunt):
The error handling added to ZOOKEEPER-1174 is causing this bug.
{noformat}
try {
sockKey = sock.register(selector, SelectionKey.OP_CONNECT);
boolean immediateConnect = sock.connect(addr);
if (immediateConnect) {
sendThread.primeConnection();
}
} catch (IOException e) {
LOG.error("Unable to open socket to " + addr);
sock.close();
}
if an exception is thrown inside the try the socket is closed, however sockKey
is left set. As a result he client will not attempt to reconnect to the server
(typically it will continue to retry every second or so). I think the bug here
is that the exception should be rethrown, otw the 'cleanup' routine in
SendThread.run will not be executed.
> testEarlyLeaderAbandonment failing on solaris - clients not retrying
> connection
> -------------------------------------------------------------------------------
>
> Key: ZOOKEEPER-1271
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1271
> Project: ZooKeeper
> Issue Type: Bug
> Components: java client
> Affects Versions: 3.4.0, 3.5.0
> Reporter: Patrick Hunt
> Priority: Blocker
> Fix For: 3.4.0, 3.5.0
>
> Attachments: solarisClientFailure.txt.gz
>
>
> See:
> https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper_branch34_solaris/1/testReport/junit/org.apache.zookeeper.server.quorum/QuorumPeerMainTest/testEarlyLeaderAbandonment/
> Notice that the clients attempt to connect before the servers have bound,
> then 30 seconds later, after seemingly no further client activity we see:
> 2011-10-28 21:40:56,828 [myid:] - INFO
> [main-SendThread(localhost:11227):ClientCnxn$SendThread@1057] - Client
> session timed out, have not heard from server in 30032ms for sessionid 0x0,
> closing socket connection and attempting reconnect
> I believe this is different from ZOOKEEPER-1270 because in the 1270 case it
> seems like the clients are attempting to connect but the servers are not
> accepting (notice the stat commands are being dropped due to no server
> running)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira