[
https://issues.apache.org/jira/browse/ACCUMULO-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eric Newton resolved ACCUMULO-1572.
-----------------------------------
Resolution: Fixed
Fix Version/s: 1.4.4
Wrote an integration test that reproduced the problem, then eliminated the
fail-fast on connection lost.
> single node zookeeper failure kills connected accumulo servers
> --------------------------------------------------------------
>
> Key: ACCUMULO-1572
> URL: https://issues.apache.org/jira/browse/ACCUMULO-1572
> Project: Accumulo
> Issue Type: Bug
> Components: master, tserver
> Affects Versions: 1.5.0
> Reporter: Eric Newton
> Assignee: Eric Newton
> Priority: Blocker
> Fix For: 1.4.4, 1.5.1, 1.6.0
>
>
> Drew Thornton writes on the user mailing list:
> {quote}
> If one zookeeper node is shutdown/fails/whatever and the rest of the ensemble
> stays up, the tablet servers attached as clients to the shutdown node
> immediately fail. If one of the clients happens to be the master, the cluster
> goes down.
> Accumulo does not seem to be failing over to the remaining zookeeper nodes,
> and this causes me to restart the individual tablet servers again.
> The zookeeper ensemble is very stable and has plenty of
> bandwidth/memory/processing, so taking one node down out of five doesn't
> crash the zookeepers, just the tablet servers...
> {quote}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira