[
https://issues.apache.org/jira/browse/HBASE-21035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16613009#comment-16613009
]
Allan Yang commented on HBASE-21035:
------------------------------------
Since [~stack] is running into this problem too, I still want to mention that I
still think scheduling a SCP for a splitting server is doable(as in my
patch)... I have already commit this patch in our internal version. We want to
make HBase2.0 in to production, we can't afford to resolve this dilemma using
external tools like HBCK2(which is not available yet...)
It won't cause any trouble if multiple SCP was submitted for a single server(If
you are concern about safe fence [~Apache9] ). And I still can't think of any
data loss case or meta inconsistency case resulting of this. And actually, when
master starting, [~Apache9] also use a similar way to check if there is already
a InitMetaProcedure, scheduling one if not.
> Meta Table should be able to online even if all procedures are lost
> -------------------------------------------------------------------
>
> Key: HBASE-21035
> URL: https://issues.apache.org/jira/browse/HBASE-21035
> Project: HBase
> Issue Type: Sub-task
> Affects Versions: 2.1.0
> Reporter: Allan Yang
> Assignee: Allan Yang
> Priority: Major
> Attachments: HBASE-21035.branch-2.0.001.patch,
> HBASE-21035.branch-2.1.001.patch
>
>
> After HBASE-20708, we changed the way we init after master starts. It will
> only check WAL dirs and compare to Zookeeper RS nodes to decide which server
> need to expire. For servers which's dir is ending with 'SPLITTING', we assure
> that there will be a SCP for it.
> But, if the server with the meta region crashed before master restarts, and
> if all the procedure wals are lost (due to bug, or deleted manually,
> whatever), the new restarted master will be stuck when initing. Since no one
> will bring meta region online.
> Although it is an anomaly case, but I think no matter what happens, we need
> to online meta region. Otherwise, we are sitting ducks, noting can be done.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)