[
https://issues.apache.org/jira/browse/ZOOKEEPER-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046298#comment-14046298
]
Orion Hodson commented on ZOOKEEPER-1933:
-----------------------------------------
Hi Raul, I'm responsible for the change in select() and also a relative
neophyte in the ZooKeeper code.
>From scanning the patch alone, it may not be obvious but there is an
>isunrecoverable() check in the same loop do_io() loop. It's comes just after
>zookeeper_process().
Previously there was no error checking in the Win32 case for select(). The
issue we've seen with select() is when the socket is remotely closed and then
the descriptor is bad and so select() fails without waiting - there is no
change in state at this point, just a bad descriptor. It used to be the
descriptor was never removed from the fd_set's and so it'd burn CPU. The
variable interest in the loop serves two purposes and because it wasn't zeroed
out in the case of error, there'd be a socket error and then the code would
attempt to process I/O on the socket and then fail there. With the change here,
we see sessions being re-established as expected.
Thanks
Orion
> Windows release build of zk client cannot connect to zk server
> --------------------------------------------------------------
>
> Key: ZOOKEEPER-1933
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1933
> Project: ZooKeeper
> Issue Type: Bug
> Components: c client
> Affects Versions: 3.4.6
> Reporter: Norris Lee
> Assignee: Orion Hodson
> Fix For: 3.5.0
>
> Attachments: ZOOKEEPER-1933-2.patch, ZOOKEEPER-1933-3.patch,
> ZOOKEEPER-1933.patch, ZOOKEEPER-1933.patch, ZOOKEEPER-1933.patch
>
>
> When building zookeeper in Visual Studio in debug mode, the client can
> connect to the server without error. When building in release mode, I get a
> continuous error message:
> {code}
> 2014-06-02 11:25:20,070:7144(0xc84):ZOO_INFO@zookeeper_init_internal@1008:
> Initiating client connection, host=192.168.39.43:5181 sessionTimeout=30000
> watcher=10049C90 sessionId=0 sessionPasswd=<null> context=001FC0F0 flags=0
> 2014-06-02 11:25:20,072:7144(0xc84):ZOO_DEBUG@start_threads@221: starting
> threads...
> 2014-06-02 11:25:20,072:7144(0x1ea0):ZOO_DEBUG@do_completion@460: started
> completion thread
> 2014-06-02 11:25:20,072:7144(0x1e08):ZOO_DEBUG@do_io@403: started IO thread
> 2014-06-02
> 11:25:20,072:7144(0x1e08):ZOO_DEBUG@get_next_server_in_reconfig@1148: [OLD]
> count=0 capacity=0 next=0 hasnext=0
> 2014-06-02
> 11:25:20,072:7144(0x1e08):ZOO_DEBUG@get_next_server_in_reconfig@1151: [NEW]
> count=1 capacity=16 next=0 hasnext=1
> 2014-06-02
> 11:25:20,072:7144(0x1e08):ZOO_DEBUG@get_next_server_in_reconfig@1160: Using
> next from NEW=192.168.39.43:5181
> 2014-06-02 11:25:20,072:7144(0x1e08):ZOO_DEBUG@zookeeper_interest@1992: [zk]
> connect()
> 2014-06-02 11:25:20,158:7144(0x1e08):ZOO_ERROR@handle_socket_error_msg@1847:
> Socket [192.168.39.43:5181] zk retcode=-4, errno=10035(Unknown error): failed
> to send a handshake packet: Unknown error
> 2014-06-02 11:25:20,158:7144(0x1e08):ZOO_DEBUG@handle_error@1595: Previous
> connection=[192.168.39.43:5181] delay=0
> 2014-06-02
> 11:25:20,158:7144(0x1e08):ZOO_DEBUG@get_next_server_in_reconfig@1148: [OLD]
> count=0 capacity=0 next=0 hasnext=0
> 2014-06-02
> 11:25:20,158:7144(0x1e08):ZOO_DEBUG@get_next_server_in_reconfig@1151: [NEW]
> count=1 capacity=16 next=0 hasnext=1
> 2014-06-02
> 11:25:20,158:7144(0x1e08):ZOO_DEBUG@get_next_server_in_reconfig@1160: Using
> next from NEW=192.168.39.43:5181
> 2014-06-02 11:25:20,158:7144(0x1e08):ZOO_DEBUG@zookeeper_interest@1992: [zk]
> connect()
> 2014-06-02 11:25:20,159:7144(0x1e08):ZOO_ERROR@handle_socket_error_msg@1847:
> Socket [192.168.39.43:5181] zk retcode=-4, errno=10035(Unknown error): failed
> to send a handshake packet: Unknown error
> 2014-06-02 11:25:20,159:7144(0x1e08):ZOO_DEBUG@handle_error@1595: Previous
> connection=[192.168.39.43:5181] delay=0
> 2014-06-02
> 11:25:20,159:7144(0x1e08):ZOO_DEBUG@get_next_server_in_reconfig@1148: [OLD]
> count=0 capacity=0 next=0 hasnext=0
> 2014-06-02
> 11:25:20,159:7144(0x1e08):ZOO_DEBUG@get_next_server_in_reconfig@1151: [NEW]
> count=1 capacity=16 next=0 hasnext=1
> 2014-06-02
> 11:25:20,159:7144(0x1e08):ZOO_DEBUG@get_next_server_in_reconfig@1160: Using
> next from NEW=192.168.39.43:5181
> 2014-06-02 11:25:20,159:7144(0x1e08):ZOO_DEBUG@zookeeper_interest@1992: [zk]
> connect()
> 2014-06-02 11:25:20,159:7144(0x1e08):ZOO_ERROR@handle_socket_error_msg@1847:
> Socket [192.168.39.43:5181] zk retcode=-4, errno=10035(Unknown error): failed
> to send a handshake packet: Unknown error
> 2014-06-02 11:25:20,159:7144(0x1e08):ZOO_DEBUG@handle_error@1595: Previous
> connection=[192.168.39.43:5181] delay=0
> 2014-06-02
> 11:25:20,159:7144(0x1e08):ZOO_DEBUG@get_next_server_in_reconfig@1148: [OLD]
> count=0 capacity=0 next=0 hasnext=0
> 2014-06-02
> 11:25:20,159:7144(0x1e08):ZOO_DEBUG@get_next_server_in_reconfig@1151: [NEW]
> count=1 capacity=16 next=0 hasnext=1
> 2014-06-02
> 11:25:20,159:7144(0x1e08):ZOO_DEBUG@get_next_server_in_reconfig@1160: Using
> next from NEW=192.168.39.43:5181
> 2014-06-02 11:25:20,159:7144(0x1e08):ZOO_DEBUG@zookeeper_interest@1992: [zk]
> connect()
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)