Eric Newton created ACCUMULO-777:
------------------------------------
Summary: isLockHeld needs better bullet-proofing against transient
errors
Key: ACCUMULO-777
URL: https://issues.apache.org/jira/browse/ACCUMULO-777
Project: Accumulo
Issue Type: Bug
Components: client
Affects Versions: 1.3.5, 1.4.0, 1.3.6, 1.4.1
Environment: medium sized cluster
Reporter: Eric Newton
Assignee: Eric Newton
Fix For: 1.4.2, 1.4.1
During the minor compaction, the zookeeper lock for the tablet server is
double-checked prior to updating the METADATA table information. In one
unlucky moment, the zookeeper connection was lost during this check. The
tablet server failed the check, but the lock was not lost. As a result, the
root tablet remained hosted for another 4 weeks, but did not flush mutations to
disk. When memory filled, the operator noticed a long hold time and killed the
server. This caused a log recovery of 98 1G of logs, some of which were very
old.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira