[ 
https://issues.apache.org/jira/browse/HBASE-21035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16611658#comment-16611658
 ] 

Duo Zhang commented on HBASE-21035:
-----------------------------------

I think the most difficult thing here is that, how do we determine that there 
is something wrong and we should force meta online to allow hbck2 to do 
something, as it will be easily to introduce races if we deploy a 
FixingMetaProcedure but actually there is no problem...

[~stack] So the problem here is that, if we can not loadMeta, then the master 
will exit, and we can do nothing from outside? IIRC the rpc service should have 
been started, otherwise we can not assign meta. So maybe the problem here is 
that, we should make master retrying for a longer time before exiting, and add 
a new method in the rpc service, which is for hbck2 to schedule some recovery 
procedures?

> Meta Table should be able to online even if all procedures are lost
> -------------------------------------------------------------------
>
>                 Key: HBASE-21035
>                 URL: https://issues.apache.org/jira/browse/HBASE-21035
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 2.1.0
>            Reporter: Allan Yang
>            Assignee: Allan Yang
>            Priority: Major
>         Attachments: HBASE-21035.branch-2.0.001.patch, 
> HBASE-21035.branch-2.1.001.patch
>
>
> After HBASE-20708, we changed the way we init after master starts. It will 
> only check WAL dirs and compare to Zookeeper RS nodes to decide which server 
> need to expire. For servers which's dir is ending with 'SPLITTING', we assure 
> that there will be a SCP for it.
> But, if the server with the meta region crashed before master restarts, and 
> if all the procedure wals are lost (due to bug, or deleted manually, 
> whatever), the new restarted master will be stuck when initing. Since no one 
> will bring meta region online.
> Although it is an anomaly case, but I think no matter what happens, we need 
> to online meta region. Otherwise, we are sitting ducks, noting can be done.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to