[
https://issues.apache.org/jira/browse/HBASE-21035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16614382#comment-16614382
]
stack commented on HBASE-21035:
-------------------------------
I've been trying to make basic progress on HBCK2. I pushed up an HBCK2 tool
that can call our only Hbck method over in the hbase-operator-tools project:
https://github.com/apache/hbase-operator-tools/commit/0cf0e0ecf2d4a33522e0e273f9310f11aa2eaee6.
It is missing so much -- test, how to package, how to pass in pointer to the
cluster to fix, doc., etc., but I'm working on it.
Next is adding assign and bulk assign to Hbck Service. This Hbck assign will be
different to Admin Assign in that it should work even though the Master is
'initializing' (Admin assign fails because we check master state before we do
anything -- which makes it so can't schedule meta assign if it offlined). The
hbck assign bypass stuff like calling CPs too. I also want bulk assign -- i.e.
passing a thousand regions at a time to assign -- because when doing repairs,
clusters will probably be big with lots of regions in odd states. I've been
running a fixup job on a cluster where I have thousands of regions in OPENING
state (I removed the Master WAL Procs after crashing it... ). Doing assigns one
at a time on the command-line doesn't cut it... It takes from 10-40 seconds per
assign.
> Meta Table should be able to online even if all procedures are lost
> -------------------------------------------------------------------
>
> Key: HBASE-21035
> URL: https://issues.apache.org/jira/browse/HBASE-21035
> Project: HBase
> Issue Type: Sub-task
> Affects Versions: 2.1.0
> Reporter: Allan Yang
> Assignee: Allan Yang
> Priority: Major
> Attachments: HBASE-21035.branch-2.0.001.patch,
> HBASE-21035.branch-2.1.001.patch
>
>
> After HBASE-20708, we changed the way we init after master starts. It will
> only check WAL dirs and compare to Zookeeper RS nodes to decide which server
> need to expire. For servers which's dir is ending with 'SPLITTING', we assure
> that there will be a SCP for it.
> But, if the server with the meta region crashed before master restarts, and
> if all the procedure wals are lost (due to bug, or deleted manually,
> whatever), the new restarted master will be stuck when initing. Since no one
> will bring meta region online.
> Although it is an anomaly case, but I think no matter what happens, we need
> to online meta region. Otherwise, we are sitting ducks, noting can be done.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)