[
https://issues.apache.org/jira/browse/HBASE-21191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16613006#comment-16613006
]
stack commented on HBASE-21191:
-------------------------------
bq. It is a bit of hack I think...
Smile. Thanks for taking a look.
bq. I think we don't have to scan the meta table to make sure it is online.
Sorry. Being paranoid. I suppose it would be hard for a RS to be in the online
set and in the RegionState and meta is not deployed there. Let me undo the scan
bit.
bq. schedule a initMetaProc (don't if there is already one)
I don't want to schedule an assign in here. I want the operator to do it. See
our conversation over in HBASE-20786.
bq. Another opinion is that we don't have to wait namespace region.
Unfortunately, initClusterSchemaService is deceptive. It implements guava
Service and does async start BUT the TableNamespaceManager it starts is a
blocking call. I would like to undo the blocking call and undo namespace as a
distinct table but that is a battle for another day.
Thanks for review. Let me put up a patch that drops the verifying Scan.
> Add a holding-pattern if no assign for meta or namespace (Can happen if
> masterprocwals have been cleared).
> ----------------------------------------------------------------------------------------------------------
>
> Key: HBASE-21191
> URL: https://issues.apache.org/jira/browse/HBASE-21191
> Project: HBase
> Issue Type: Sub-task
> Components: amv2
> Reporter: stack
> Assignee: stack
> Priority: Major
> Fix For: 2.1.1
>
> Attachments: HBASE-21191.branch-2.1.001.patch,
> HBASE-21191.branch-2.1.002.patch
>
>
> If the masterprocwals have been removed -- operator error, hdfs dataloss, or
> because we have gotten ourselves into a pathological state where we have
> hundreds of masterprocwals too process and it is taking too long so we just
> want to startover -- then master startup will have a dilemma. Master startup
> needs hbase:meta to be online. If the masterprocwals have been removed, there
> may be no outstanding assign or a servercrashprocedure with coverage for
> hbase:meta (I ran into this issue repeatedly in internal testing purging
> masterprocwals on a large test cluster). Worse, when master startup cannot
> find an online hbase:meta, it exits after exhausting the RPC retries.
> So, we need a holding-pattern for master startup if hbase:meta is not online
> if only so an operator can schedule an assign for meta or so they can assign
> fixup procedures (HBASE-20786 has discussion on why we cannot just
> auto-schedule an assign of meta).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)