[
https://issues.apache.org/jira/browse/ZOOKEEPER-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Flavio Junqueira updated ZOOKEEPER-1856:
----------------------------------------
Fix Version/s: 3.6.0
3.5.3
> zookeeper C-client can fail to switch from a dead server in a 3+ server
> ensemble if the client only has a 2 server list.
> ------------------------------------------------------------------------------------------------------------------------
>
> Key: ZOOKEEPER-1856
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1856
> Project: ZooKeeper
> Issue Type: Bug
> Components: c client
> Reporter: Dutch T. Meyer
> Assignee: Michi Mutsuzaki
> Priority: Minor
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-1856.patch
>
>
> If a client has a 2 server list, and is currently connected to the last
> server in that list, and that server then goes offline, the addrvec_next()
> call handle_error() will push the client to the start of the list and
> terminate the connection.
> Then, the zoo_cycle_next_server() call in zookeeper_interest will be called
> in response to the connection failure, and the client will cycle back to the
> failed server.
> In this way, a client who has a list of only 2 servers can get stuck on the
> one failed server. This would only be an issue in an ensemble larger than 2
> of course, because failing 1 out of 2 would lead to quorum loss anyway.
> There are other harmonics possible if every other server in the list is
> failed, but this is simplest to reproduce in a 3 server ensemble where the
> client only knows about 2 servers, one of which then fails. There are
> probably some elegant fixes here, but I think the simplest is to add a flag
> to track whether a server has been accessed before, and if it hasn't, don't
> call zoo_cycle_next_server() at the top of the zookeeper_interest() function.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)