[jira] [Commented] (HBASE-20708) Remove the usage of RecoverMetaProcedure in master startup

Duo Zhang (JIRA) Thu, 14 Jun 2018 23:29:07 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-20708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16513414#comment-16513414
 ]


Duo Zhang commented on HBASE-20708:
-----------------------------------

{quote}
If a Master joins a cluster where there is no crashed RS, it just scans meta 
and away we go?
{quote}

The basic idea is that, all the recovery works should be done in SCP, including 
assigning meta region. And then, we know that the region state information for 
meta region is on zk, not in meta region itself, so we can start processing 
recovery for meta region in SCP before AssignmentManager loads region states 
from meta region. So theoretically this could work. And actually it does work, 
as I've already uploaded a patch here.

So the second problem here is that, we need to make sure that every crashed RS 
should have a SCP for it. This is not straight forward when master restarts. In 
the old implementation, the work is done after we loading all the region states 
from meta, since then we can get all the RSes which have carry regions, and 
compare it with the online servers to find out the dead ones.

But this will not work if we want to change to the logic above as it introduces 
cyclic dependency. As the SCP scheduling will depend on AM loads the region 
states first, but loading region states need the meta region to be online, so 
it depends on SCP to bring meta region online...

So we need to find another way to do this. The basic idea is that, we can get 
the live servers by scanning the wal directory, as a RS must initialize the wal 
system before carrying regions(there maybe a problem that if all the regions on 
that RS is WAL less, but I think at least we can create the parent directory 
first). This does not depend on region states so can happen before AM loads the 
region states.

{quote}
We'll still need serverCrashProcessingEnabled type flag to hold up Master 
startup until meta is online?
{quote}
Just use AM.metaRegionLoaded is fine. The serverCrashProcessingEnabled flag is 
useless now.

{quote}
I like this idea too... Since a server can only crash once... so queue per 
server....
{quote}
Will fine a new issue for it, as the patch here is already big enough...

> Remove the usage of RecoverMetaProcedure in master startup
> ----------------------------------------------------------
>
>                 Key: HBASE-20708
>                 URL: https://issues.apache.org/jira/browse/HBASE-20708
>             Project: HBase
>          Issue Type: Bug
>          Components: proc-v2, Region Assignment
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>            Priority: Blocker
>             Fix For: 3.0.0, 2.1.0
>
>         Attachments: HBASE-20708-v1.patch, HBASE-20708-v2.patch, 
> HBASE-20708-v3.patch, HBASE-20708-v4.patch, HBASE-20708-v5.patch, 
> HBASE-20708.patch
>
>
> In HBASE-20700, we make RecoverMetaProcedure use a special lock which is only 
> used by RMP to avoid dead lock with MoveRegionProcedure. But we will always 
> schedule a RMP when master starting up, so we still need to make sure that 
> there is no race between this RMP and other RMPs and SCPs scheduled before 
> the master restarts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20708) Remove the usage of RecoverMetaProcedure in master startup

Reply via email to