Bahram Chehrazy created HBASE-21894:
---------------------------------------

             Summary: Master doesn't update the meta state as soon as the meta 
server dies
                 Key: HBASE-21894
                 URL: https://issues.apache.org/jira/browse/HBASE-21894
             Project: HBase
          Issue Type: Bug
          Components: master, meta
    Affects Versions: 3.0.0
            Reporter: Bahram Chehrazy
            Assignee: Bahram Chehrazy


If the active master crashes after meta server dies, there is a slight chance 
of master getting into a state where the ZK says meta is OPEN, but the server 
is dead and there is no active SCP to recover it (perhaps the SCP has aborted 
and the procWALs were corrupted). In this case the waitForMetaOnline never 
returns.

 

We've seen this happening a few times when there had been a temporary HDFS 
outage. Following log lines shows this state.

 

2019-01-17 18:55:48,497 WARN  [master/************:16000:becomeActiveMaster] 
master.HMaster: hbase:meta,,1.1588230740 is NOT online; state=

{1588230740 *state=*OPEN**, ts=1547780128227, 
server=*************,16020,1547776821322}

; *ServerCrashProcedures=false*. Master startup cannot progress, in 
holding-pattern until region onlined.

 

I'm still investigating why and how to prevent getting into this bad state, but 
nevertheless the master should be able to recover during a restart by 
initiating a new SCP to fix the meta.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to