Dutch T. Meyer created ZOOKEEPER-1856:
-----------------------------------------

             Summary: zookeeper C-client can fail to switch from a dead server 
in a 3+ server ensemble if the client only has a 2 server list.
                 Key: ZOOKEEPER-1856
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1856
             Project: ZooKeeper
          Issue Type: Bug
          Components: c client
            Reporter: Dutch T. Meyer
            Priority: Minor


If a client has a 2 server list, and is currently connected to the last server 
in that list, and that server then goes offline, the addrvec_next() call 
handle_error() will push the client to the start of the list and terminate the 
connection.

Then, the zoo_cycle_next_server() call in zookeeper_interest will be called in 
response to the connection failure, and the client will cycle back to the 
failed server.

In this way, a client who has a list of only 2 servers can get stuck on the one 
failed server.  This would only be an issue in an ensemble larger than 2 of 
course, because failing 1 out of 2 would lead to quorum loss anyway.

There are other harmonics possible if every other server in the list is failed, 
but this is simplest to reproduce in a 3 server ensemble where the client only 
knows about 2 servers, one of which then fails.  There are probably some 
elegant fixes here, but I think the simplest is to add a flag to track whether 
a server has been accessed before, and if it hasn't, don't call 
zoo_cycle_next_server() at the top of the zookeeper_interest() function.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to