>From my brief digging, my feeling was that the java way of doing it was better: statichostprovider is the only one that increments pointers and gives out addresses and the caller doesn't do any of this... But this may be too much of a change for C.
On Jul 6, 2016 03:53, "Flavio Junqueira (JIRA)" <[email protected]> wrote: > > [ > https://issues.apache.org/jira/browse/ZOOKEEPER-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15364131#comment-15364131 > ] > > Flavio Junqueira commented on ZOOKEEPER-2466: > --------------------------------------------- > > [~shralex] Good catch, it is exactly the same problem. The description > about a list of two servers, but it is an issue in general that we skip one > server of the list every time. > > [~hanm] The test case isn't related to reconfiguration, that's correct. > However, zh->reconfig is set to 1 initially according to the logic we have > implemented. That's what I observed while tracing the execution. The fact > that it is set to 1 initially actually changes the lists we are getting the > server addresses from (there are _old and _new lists in the handle). > > There isn't much in the output, but here is a sample: > > {noformat} > 2016-07-05 18:35:50,174:42240:ZOO_INFO@log_env@1027: Client > environment:zookeeper.version=zookeeper C client 3.5.2 > 2016-07-05 18:35:50,174:42240:ZOO_INFO@log_env@1031: Client environment: > host.name=fpj-test-apache-01 > 2016-07-05 18:35:50,174:42240:ZOO_INFO@log_env@1038: Client environment: > os.name=Linux > 2016-07-05 18:35:50,174:42240:ZOO_INFO@log_env@1039: Client > environment:os.arch=4.4.0-28-generic > 2016-07-05 18:35:50,174:42240:ZOO_INFO@log_env@1040: Client > environment:os.version=#47-Ubuntu SMP Fri Jun 24 10:09:13 UTC 2016 > 2016-07-05 18:35:50,174:42240:ZOO_INFO@log_env@1048: Client environment: > user.name=fpj > 2016-07-05 18:35:50,174:42240:ZOO_INFO@log_env@1056: Client > environment:user.home=/root > 2016-07-05 18:35:50,174:42240:ZOO_INFO@log_env@1068: Client > environment:user.dir=/home/fpj/code/zookeeper-3.5.2-alpha/src/c > 2016-07-05 18:35:50,174:42240:ZOO_INFO@zookeeper_init_internal@1111: > Initiating client connection, host=127.0.0.1:22182,127.0.0.1:22181 > sessionTimeout=10000 watcher=0x447050 sessionId=0 sessionPasswd=<null> > context=0x7ffcc708fec0 flags=0 > 2016-07-05 18:35:51,174:42240:ZOO_WARN@get_next_server_in_reconfig@1256: > [OLD] count=0 capacity=0 next=0 hasnext=0 > 2016-07-05 18:35:51,174:42240:ZOO_WARN@get_next_server_in_reconfig@1259: > [NEW] count=2 capacity=16 next=0 hasnext=1 > 2016-07-05 18:35:51,175:42240:ZOO_WARN@get_next_server_in_reconfig@1268: > Using next from NEW=127.0.0.1:22182 > 2016-07-05 18:35:51,175:42240:ZOO_ERROR@handle_socket_error_msg@2353: > Socket [127.0.0.1:22182] zk retcode=-4, errno=111(Connection refused): > server refused to accept the client > 2016-07-05 18:35:51,175:42240:ZOO_WARN@get_next_server_in_reconfig@1256: > [OLD] count=0 capacity=0 next=0 hasnext=0 > 2016-07-05 18:35:51,175:42240:ZOO_WARN@get_next_server_in_reconfig@1259: > [NEW] count=2 capacity=16 next=1 hasnext=1 > 2016-07-05 18:35:51,175:42240:ZOO_WARN@get_next_server_in_reconfig@1268: > Using next from NEW=127.0.0.1:22181 > 2016-07-05 18:35:51,175:42240:ZOO_ERROR@handle_socket_error_msg@2353: > Socket [127.0.0.1:22181] zk retcode=-4, errno=111(Connection refused): > server refused to accept the client > 2016-07-05 18:35:51,175:42240:ZOO_WARN@get_next_server_in_reconfig@1256: > [OLD] count=0 capacity=0 next=0 hasnext=0 > 2016-07-05 18:35:51,175:42240:ZOO_WARN@get_next_server_in_reconfig@1259: > [NEW] count=2 capacity=16 next=2 hasnext=0 > 2016-07-05 18:35:51,175:42240:ZOO_WARN@get_next_server_in_reconfig@1279: > Failed to find either new or old > 2016-07-05 18:35:51,175:42240:ZOO_ERROR@handle_socket_error_msg@2353: > Socket [127.0.0.1:22182] zk retcode=-4, errno=111(Connection refused): > server refused to accept the client > 2016-07-05 18:35:51,175:42240:ZOO_ERROR@handle_socket_error_msg@2353: > Socket [127.0.0.1:22182] zk retcode=-4, errno=111(Connection refused): > server refused to accept the client > 2016-07-05 18:35:51,176:42240:ZOO_ERROR@handle_socket_error_msg@2353: > Socket [127.0.0.1:22182] zk retcode=-4, errno=111(Connection refused): > server refused to accept the client > 2016-07-05 18:35:51,176:42240:ZOO_ERROR@handle_socket_error_msg@2353: > Socket [127.0.0.1:22182] zk retcode=-4, errno=111(Connection refused): > server refused to accept the client > <This line keeps repeating> > {noformat} > > No server seems to be up for the client to connect, which I don't > understand the reason, but I've focused mostly on why the address is the > same after some point rather than alternating between the two addresses. > > > Client skips servers when trying to connect > > ------------------------------------------- > > > > Key: ZOOKEEPER-2466 > > URL: > https://issues.apache.org/jira/browse/ZOOKEEPER-2466 > > Project: ZooKeeper > > Issue Type: Bug > > Components: c client > > Reporter: Flavio Junqueira > > Assignee: Flavio Junqueira > > Priority: Critical > > Fix For: 3.5.3, 3.6.0 > > > > > > I've been looking at {{Zookeeper_simpleSystem::testFirstServerDown}} and > I observed the following behavior. The list of servers to connect contains > two servers, let's call them S1 and S2. The client never connects, but the > odd bit is the sequence of servers that the client tries to connect to: > > {noformat} > > S1 > > S2 > > S1 > > S1 > > S1 > > <keeps repeating S1> > > {noformat} > > It intrigued me that S2 is only tried once and never again. Checking the > code, here is what happens. Initially, {{zh->reconfig}} is 1, so in > {{zoo_cycle_next_server}} we return an address from > {{get_next_server_in_reconfig}}, which is taken from {{zh->addrs_new}} in > this test case. The attempt to connect fails, and {{handle_error}} is > invoked in the error handling path. {{handle_error}} actually invokes > {{addrvec_next}} which changes the address pointer to the next server on > the list. > > After two attempts, it decides that it has tried all servers in > {{zoo_cycle_next_server}} and sets {{zh->reconfig}} to zero. Once > {{zh->reconfig == 0}}, we have that each call to {{zoo_cycle_next_server}} > moves the address pointer to the next server in {{zh->addrs}}. But, given > that {{handle_error}} also moves the pointer to the next server, we end up > moving the pointer ahead twice upon every failed attempt to connect, which > is wrong. > > > > -- > This message was sent by Atlassian JIRA > (v6.3.4#6332) >
