[ 
https://issues.apache.org/jira/browse/ACCUMULO-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Newton resolved ACCUMULO-777.
----------------------------------

    Resolution: Fixed

fixed in r1397117 r1397120.
                
> isLockHeld needs better bullet-proofing against transient errors
> ----------------------------------------------------------------
>
>                 Key: ACCUMULO-777
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-777
>             Project: Accumulo
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 1.4.1, 1.3.6, 1.4.0, 1.3.5
>         Environment: medium sized cluster
>            Reporter: Eric Newton
>            Assignee: Eric Newton
>             Fix For: 1.4.2
>
>
> During the minor compaction, the zookeeper lock for the tablet server is 
> double-checked prior to updating the METADATA table information.  In one 
> unlucky moment, the zookeeper connection was lost during this check.  The 
> tablet server failed the check, but the lock was not lost.  As a result, the 
> root tablet remained hosted for another 4 weeks, but did not flush mutations 
> to disk.  When memory filled, the operator noticed a long hold time and 
> killed the server.  This caused a log recovery of 98 1G of logs, some of 
> which were very old.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to