[
https://issues.apache.org/jira/browse/HBASE-21894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bahram Chehrazy updated HBASE-21894:
------------------------------------
Description:
When the meta server dies, Master moves that server to the deadServers list and
submits a SCP, but it doesn't change the Meta region state (to CLOSING, CLOSED
or OFFLINE) until after SCP finishes. Only at that time the meta region state
changes from OPEN to OPENING, and then quickly back to OPEN.
This could cause problems if some procedures try to update meta while master is
recovering the meta region, or even worse, if the master also dies in the mean
time. Other potential problem include servers trying to update the meta which
it's down, causing them to abort after several retries.
was:
If the active master crashes after meta server dies, there is a slight chance
of master getting into a state where the ZK says meta is OPEN, but the server
is dead and there is no active SCP to recover it (perhaps the SCP has aborted
and the procWALs were corrupted). In this case the waitForMetaOnline never
returns.
We've seen this happening a few times when there had been a temporary HDFS
outage. Following log lines shows this state.
2019-01-17 18:55:48,497 WARN [master/************:16000:becomeActiveMaster]
master.HMaster: hbase:meta,,1.1588230740 is NOT online; state=
{1588230740 *state=*OPEN**, ts=1547780128227,
server=*************,16020,1547776821322}
; *ServerCrashProcedures=false*. Master startup cannot progress, in
holding-pattern until region onlined.
I'm still investigating why and how to prevent getting into this bad state, but
nevertheless the master should be able to recover during a restart by
initiating a new SCP to fix the meta.
> Master doesn't update the meta state as soon as the meta server dies
> --------------------------------------------------------------------
>
> Key: HBASE-21894
> URL: https://issues.apache.org/jira/browse/HBASE-21894
> Project: HBase
> Issue Type: Bug
> Components: master, meta
> Affects Versions: 3.0.0
> Reporter: Bahram Chehrazy
> Assignee: Bahram Chehrazy
> Priority: Major
>
>
> When the meta server dies, Master moves that server to the deadServers list
> and submits a SCP, but it doesn't change the Meta region state (to CLOSING,
> CLOSED or OFFLINE) until after SCP finishes. Only at that time the meta
> region state changes from OPEN to OPENING, and then quickly back to OPEN.
>
> This could cause problems if some procedures try to update meta while master
> is recovering the meta region, or even worse, if the master also dies in the
> mean time. Other potential problem include servers trying to update the meta
> which it's down, causing them to abort after several retries.
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)