[
https://issues.apache.org/jira/browse/ACCUMULO-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eric Newton resolved ACCUMULO-777.
----------------------------------
Resolution: Fixed
fixed in r1397117 r1397120.
> isLockHeld needs better bullet-proofing against transient errors
> ----------------------------------------------------------------
>
> Key: ACCUMULO-777
> URL: https://issues.apache.org/jira/browse/ACCUMULO-777
> Project: Accumulo
> Issue Type: Bug
> Components: client
> Affects Versions: 1.4.1, 1.3.6, 1.4.0, 1.3.5
> Environment: medium sized cluster
> Reporter: Eric Newton
> Assignee: Eric Newton
> Fix For: 1.4.2
>
>
> During the minor compaction, the zookeeper lock for the tablet server is
> double-checked prior to updating the METADATA table information. In one
> unlucky moment, the zookeeper connection was lost during this check. The
> tablet server failed the check, but the lock was not lost. As a result, the
> root tablet remained hosted for another 4 weeks, but did not flush mutations
> to disk. When memory filled, the operator noticed a long hold time and
> killed the server. This caused a log recovery of 98 1G of logs, some of
> which were very old.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira