Nick Dimiduk created HBASE-12467:
------------------------------------

             Summary: Master joins cluster but never completes initialization
                 Key: HBASE-12467
                 URL: https://issues.apache.org/jira/browse/HBASE-12467
             Project: HBase
          Issue Type: Bug
          Components: master
            Reporter: Nick Dimiduk
            Assignee: Nick Dimiduk
             Fix For: 2.0.0, 0.98.9, 0.99.2


While diagnosing a rare failure in IntegrationTestLoadAndVerify, I discovered 
this scenario. Master was restarted by CM. Upon rejoining the cluster it 
successfully assumes responsibility as active master, but apparently the 
finishInitialization method never completes. The last log line from that thread 
is

{noformat}
2014-11-10 17:01:29,940 INFO  [master:ip-172-31-9-135:60000] master.HMaster: 
hbase:meta with replicaId 0 assigned=0, rit=false, 
location=ip-172-31-9-136.ec2.internal,60020,1415638551951
{noformat}

I see region states populated from existing znodes. AM inventoried the online 
regions, acknowledged that this was master failover. There it sits, responding 
to RPC's with {{PleaseHoldException: Master is initializing}}.

For the sake of resiliency, we should detect this scenario and at least release 
control as active master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to