Eric Newton created ACCUMULO-777:
------------------------------------

             Summary: isLockHeld needs better bullet-proofing against transient 
errors
                 Key: ACCUMULO-777
                 URL: https://issues.apache.org/jira/browse/ACCUMULO-777
             Project: Accumulo
          Issue Type: Bug
          Components: client
    Affects Versions: 1.3.5, 1.4.0, 1.3.6, 1.4.1
         Environment: medium sized cluster
            Reporter: Eric Newton
            Assignee: Eric Newton
             Fix For: 1.4.2, 1.4.1


During the minor compaction, the zookeeper lock for the tablet server is 
double-checked prior to updating the METADATA table information.  In one 
unlucky moment, the zookeeper connection was lost during this check.  The 
tablet server failed the check, but the lock was not lost.  As a result, the 
root tablet remained hosted for another 4 weeks, but did not flush mutations to 
disk.  When memory filled, the operator noticed a long hold time and killed the 
server.  This caused a log recovery of 98 1G of logs, some of which were very 
old.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to