Flavio Junqueira created ZOOKEEPER-2466:
-------------------------------------------
Summary: Client skips servers when trying to connect
Key: ZOOKEEPER-2466
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2466
Project: ZooKeeper
Issue Type: Bug
Components: c client
Reporter: Flavio Junqueira
Assignee: Flavio Junqueira
Priority: Critical
Fix For: 3.5.3, 3.6.0
I've been looking at {{Zookeeper_simpleSystem::testFirstServerDown}} and I
observed the following behavior. The list of servers to connect contains two
servers, let's call them S1 and S2. The client never connects, but the odd bit
is the sequence of servers that the client tries to connect to:
{noformat}
S1
S2
S1
S1
S1
<keeps repeating S1>
{noformat}
It intrigued me that S2 is only tried once and never again. Checking the code,
here is what happens. Initially, {{zh->reconfig}} is 1, so in
{{zoo_cycle_next_server}} we return an address from
{{get_next_server_in_reconfig}}, which is taken from {{zh->addrs_new}} in this
test case. The attempt to connect fails, and {{handle_error}} is invoked in the
error handling path. {{handle_error}} actually invokes {{addrvec_next}} which
changes the address pointer to the next server on the list.
After two attempts, it decides that it has tried all servers in
{{zoo_cycle_next_server}} and sets {{zh->reconfig}} to zero. Once
{{zh->reconfig == 0}}, we have that each call to {{zoo_cycle_next_server}}
moves the address pointer to the next server in {{zh->addrs}}. But, given that
{{handle_error}} also moves the pointer to the next server, we end up moving
the pointer ahead twice upon every failed attempt to connect, which is wrong.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)