[ 
https://issues.apache.org/jira/browse/ACCUMULO-2422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915471#comment-13915471
 ] 

Josh Elser commented on ACCUMULO-2422:
--------------------------------------

How long of a timeframe are you talking about here? By your tone, I'm assuming 
at least seconds, if not minutes? Indefinitely?

Getting jstack's of the masters in this state would be good. Also, you should 
check the data in zk /accumulo/uuid/masters and any children at 
/accumulo/uuid/masters/lock/zlock-* to see what's going on there. It's possible 
that some of the convenience methods that we wrap ZK with have some issue, but 
it's primarily ZK code there.

> Backup master can miss acquiring lock when primary exits
> --------------------------------------------------------
>
>                 Key: ACCUMULO-2422
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2422
>             Project: Accumulo
>          Issue Type: Bug
>          Components: fate, master
>    Affects Versions: 1.5.0
>            Reporter: Bill Havanki
>            Assignee: Bill Havanki
>            Priority: Critical
>              Labels: failover, locking
>
> While running randomwalk tests with agitation for the 1.5.1 release, I've 
> seen situations where a backup master that is eligible to grab the master 
> lock continues to wait. When this condition arises and the other master 
> restarts, both wait for the lock without success.
> I cannot reproduce the problem reliably, and I think more investigation is 
> needed to see what circumstances could be causing the problem.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to