[
https://issues.apache.org/jira/browse/ZOOKEEPER-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376300#comment-15376300
]
Michael Han commented on ZOOKEEPER-2466:
----------------------------------------
bq. but that actually could be a bug.
Good point.
bq. have a second parameter for zoo_cycle_next_server so that we update
according to the second parameter rather than always updating zh->addr_cur.
I like this solution which parameterizes zoo_cycle_next_server such that what
state it's changing is explicit in the interface, and use zoo_cycle_next_server
instead of the low level addrvec function because we could also be in reconfig
mode when this 'else' branch is executed.
> Client skips servers when trying to connect
> -------------------------------------------
>
> Key: ZOOKEEPER-2466
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2466
> Project: ZooKeeper
> Issue Type: Bug
> Components: c client
> Reporter: Flavio Junqueira
> Assignee: Flavio Junqueira
> Priority: Critical
> Fix For: 3.5.3, 3.6.0
>
>
> I've been looking at {{Zookeeper_simpleSystem::testFirstServerDown}} and I
> observed the following behavior. The list of servers to connect contains two
> servers, let's call them S1 and S2. The client never connects, but the odd
> bit is the sequence of servers that the client tries to connect to:
> {noformat}
> S1
> S2
> S1
> S1
> S1
> <keeps repeating S1>
> {noformat}
> It intrigued me that S2 is only tried once and never again. Checking the
> code, here is what happens. Initially, {{zh->reconfig}} is 1, so in
> {{zoo_cycle_next_server}} we return an address from
> {{get_next_server_in_reconfig}}, which is taken from {{zh->addrs_new}} in
> this test case. The attempt to connect fails, and {{handle_error}} is invoked
> in the error handling path. {{handle_error}} actually invokes
> {{addrvec_next}} which changes the address pointer to the next server on the
> list.
> After two attempts, it decides that it has tried all servers in
> {{zoo_cycle_next_server}} and sets {{zh->reconfig}} to zero. Once
> {{zh->reconfig == 0}}, we have that each call to {{zoo_cycle_next_server}}
> moves the address pointer to the next server in {{zh->addrs}}. But, given
> that {{handle_error}} also moves the pointer to the next server, we end up
> moving the pointer ahead twice upon every failed attempt to connect, which is
> wrong.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)