[
https://issues.apache.org/jira/browse/HBASE-21035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16612645#comment-16612645
]
stack commented on HBASE-21035:
-------------------------------
I put up a patch on HBASE-21191 that steals [~allan163]'s patch from here. It
has master go into a holding pattern complaining that hbase:meta is not online
asking for operator intervention. Ditto for namespace table. Will work on doc
on what operators need to do to effect repair in a while after I've backfilled
more of the hbck2 stuff. A test over in HBASE-21191 shows that scheduling an
hbase:meta assign seems to do the job getting the cluster up again (missing is
how to do this from the outside, from a client... working on that next).
> Meta Table should be able to online even if all procedures are lost
> -------------------------------------------------------------------
>
> Key: HBASE-21035
> URL: https://issues.apache.org/jira/browse/HBASE-21035
> Project: HBase
> Issue Type: Sub-task
> Affects Versions: 2.1.0
> Reporter: Allan Yang
> Assignee: Allan Yang
> Priority: Major
> Attachments: HBASE-21035.branch-2.0.001.patch,
> HBASE-21035.branch-2.1.001.patch
>
>
> After HBASE-20708, we changed the way we init after master starts. It will
> only check WAL dirs and compare to Zookeeper RS nodes to decide which server
> need to expire. For servers which's dir is ending with 'SPLITTING', we assure
> that there will be a SCP for it.
> But, if the server with the meta region crashed before master restarts, and
> if all the procedure wals are lost (due to bug, or deleted manually,
> whatever), the new restarted master will be stuck when initing. Since no one
> will bring meta region online.
> Although it is an anomaly case, but I think no matter what happens, we need
> to online meta region. Otherwise, we are sitting ducks, noting can be done.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)