[
https://issues.apache.org/jira/browse/HBASE-20708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16513414#comment-16513414
]
Duo Zhang commented on HBASE-20708:
-----------------------------------
{quote}
If a Master joins a cluster where there is no crashed RS, it just scans meta
and away we go?
{quote}
The basic idea is that, all the recovery works should be done in SCP, including
assigning meta region. And then, we know that the region state information for
meta region is on zk, not in meta region itself, so we can start processing
recovery for meta region in SCP before AssignmentManager loads region states
from meta region. So theoretically this could work. And actually it does work,
as I've already uploaded a patch here.
So the second problem here is that, we need to make sure that every crashed RS
should have a SCP for it. This is not straight forward when master restarts. In
the old implementation, the work is done after we loading all the region states
from meta, since then we can get all the RSes which have carry regions, and
compare it with the online servers to find out the dead ones.
But this will not work if we want to change to the logic above as it introduces
cyclic dependency. As the SCP scheduling will depend on AM loads the region
states first, but loading region states need the meta region to be online, so
it depends on SCP to bring meta region online...
So we need to find another way to do this. The basic idea is that, we can get
the live servers by scanning the wal directory, as a RS must initialize the wal
system before carrying regions(there maybe a problem that if all the regions on
that RS is WAL less, but I think at least we can create the parent directory
first). This does not depend on region states so can happen before AM loads the
region states.
{quote}
We'll still need serverCrashProcessingEnabled type flag to hold up Master
startup until meta is online?
{quote}
Just use AM.metaRegionLoaded is fine. The serverCrashProcessingEnabled flag is
useless now.
{quote}
I like this idea too... Since a server can only crash once... so queue per
server....
{quote}
Will fine a new issue for it, as the patch here is already big enough...
> Remove the usage of RecoverMetaProcedure in master startup
> ----------------------------------------------------------
>
> Key: HBASE-20708
> URL: https://issues.apache.org/jira/browse/HBASE-20708
> Project: HBase
> Issue Type: Bug
> Components: proc-v2, Region Assignment
> Reporter: Duo Zhang
> Assignee: Duo Zhang
> Priority: Blocker
> Fix For: 3.0.0, 2.1.0
>
> Attachments: HBASE-20708-v1.patch, HBASE-20708-v2.patch,
> HBASE-20708-v3.patch, HBASE-20708-v4.patch, HBASE-20708-v5.patch,
> HBASE-20708.patch
>
>
> In HBASE-20700, we make RecoverMetaProcedure use a special lock which is only
> used by RMP to avoid dead lock with MoveRegionProcedure. But we will always
> schedule a RMP when master starting up, so we still need to make sure that
> there is no race between this RMP and other RMPs and SCPs scheduled before
> the master restarts.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)