[ 
https://issues.apache.org/jira/browse/HBASE-21191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16612990#comment-16612990
 ] 

Allan Yang commented on HBASE-21191:
------------------------------------

Just reviewed the patch. It is a bit of hack I think...
I think we don't have to scan the meta table to make sure it is online. We can 
just check the RegionState.
The logic should like this:
if RegionSate of meta shows offline, then
 schedule a initMetaProc (don't if there is already one)
if RegionSate of meta shows online, then
 check if the server is alive
if the server is not alive , then
Loop wait here to check the RegionState, until meta is online on a alive 
server. Meatime, we can log a message to tell the operator to do something as 
in the patch.

Another opinion is that we don't have to wait namespace region. Since we are 
already doing it async in initClusterSchemaService(). 

> Add a holding-pattern if no assign for meta or namespace (Can happen if 
> masterprocwals have been cleared).
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-21191
>                 URL: https://issues.apache.org/jira/browse/HBASE-21191
>             Project: HBase
>          Issue Type: Sub-task
>          Components: amv2
>            Reporter: stack
>            Assignee: stack
>            Priority: Major
>             Fix For: 2.1.1
>
>         Attachments: HBASE-21191.branch-2.1.001.patch, 
> HBASE-21191.branch-2.1.002.patch
>
>
> If the masterprocwals have been removed -- operator error, hdfs dataloss, or 
> because we have gotten ourselves into a pathological state where we have 
> hundreds of masterprocwals too process and it is taking too long so we just 
> want to startover -- then master startup will have a dilemma. Master startup 
> needs hbase:meta to be online. If the masterprocwals have been removed, there 
> may be no outstanding assign or a servercrashprocedure with coverage for 
> hbase:meta (I ran into this issue repeatedly in internal testing purging 
> masterprocwals on a large test cluster). Worse, when master startup cannot 
> find an online hbase:meta, it exits after exhausting the RPC retries.
> So, we need a holding-pattern for master startup if hbase:meta is not online 
> if only so an operator can schedule an assign for meta or so they can assign 
> fixup procedures (HBASE-20786 has discussion on why we cannot just 
> auto-schedule an assign of meta).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to