[
https://issues.apache.org/jira/browse/HBASE-21035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16576229#comment-16576229
]
Duo Zhang edited comment on HBASE-21035 at 8/12/18 12:48 PM:
-------------------------------------------------------------
What if you lose some edits of meta region and then bring the meta region
online?
There are some assumption in the code, the renaming of wal directory is done in
SCP so we can make sure that there is a SCP if the wal directory is already
ended with '-splitting'. If this is not the case, we do not know what to do. In
your case it is that you deleted all the procedures, but this is not the only
possible case right? The damage is made from outside the system, or from an
unexpected behavior, i.e, a serious bug in the code, so the decision should
also be done outside the system.
I think we can add an admin method to allow operators to submit a SCP for a
special RS, and also add an option to HBCK, which does the same thing in the
patch here, scan the wal directory, re-submitting SCP for all the RSes which
wal directories are ended with '-splitting'. But I'm strongly against adding
this piece of code in normal the master startup path. It is really dangerous.
was (Author: apache9):
What if you lose some edits of meta region and then bring the meta region
online?
There are some assumption in the code, the renaming of wal directory is done in
SCP so we can make sure that there is a SCP if the wal directory is already
ended with '-splitting'. If this is not the case, we do not know what to do. In
your case it is that you deleted all the procedures, but this is not the only
possible case right? The damage is made from outside the system, or from an
expected behavior, i.e, a serious bug in the code, so the decision should also
be done outside the system.
I think we can add an admin method to allow operators to submit a SCP for a
special RS, and also add an option to HBCK, which does the same thing in the
patch here, scan the wal directory, re-submitting SCP for all the RSes which
wal directories are ended with '-splitting'. But I'm strongly against adding
this piece of code in normal the master startup path. It is really dangerous.
> Meta Table should be able to online even if all procedures are lost
> -------------------------------------------------------------------
>
> Key: HBASE-21035
> URL: https://issues.apache.org/jira/browse/HBASE-21035
> Project: HBase
> Issue Type: Sub-task
> Affects Versions: 2.1.0
> Reporter: Allan Yang
> Assignee: Allan Yang
> Priority: Major
> Attachments: HBASE-21035.branch-2.0.001.patch
>
>
> After HBASE-20708, we changed the way we init after master starts. It will
> only check WAL dirs and compare to Zookeeper RS nodes to decide which server
> need to expire. For servers which's dir is ending with 'SPLITTING', we assure
> that there will be a SCP for it.
> But, if the server with the meta region crashed before master restarts, and
> if all the procedure wals are lost (due to bug, or deleted manually,
> whatever), the new restarted master will be stuck when initing. Since no one
> will bring meta region online.
> Although it is an anomaly case, but I think no matter what happens, we need
> to online meta region. Otherwise, we are sitting ducks, noting can be done.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)