[
https://issues.apache.org/jira/browse/HBASE-21035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618590#comment-16618590
]
Duo Zhang commented on HBASE-21035:
-----------------------------------
Theoretically there are changes, since the state has been changed to
WAITING_TIMEOUT, and then back to RUNNABLE, and also the retrying count, the
timeout value, etc. But is it necessary to persist these stuffs? For me, I do
not think it is necessary to restore the WAITING_TIMEOUT state, if there is a
crash and restart, then just starts like there is no failure before. For
example, when we assign a region and fail, we change the procedure to
WAITING_TIMEOUT state and want to sleep for 30 seconds. Then the master crashes
and restarts, I think it is OK that we reschedule the TRSP immediately after
restarting? What do you guys think?
> Meta Table should be able to online even if all procedures are lost
> -------------------------------------------------------------------
>
> Key: HBASE-21035
> URL: https://issues.apache.org/jira/browse/HBASE-21035
> Project: HBase
> Issue Type: Sub-task
> Affects Versions: 2.1.0
> Reporter: Allan Yang
> Assignee: Allan Yang
> Priority: Major
> Attachments: HBASE-21035.branch-2.0.001.patch,
> HBASE-21035.branch-2.1.001.patch
>
>
> After HBASE-20708, we changed the way we init after master starts. It will
> only check WAL dirs and compare to Zookeeper RS nodes to decide which server
> need to expire. For servers which's dir is ending with 'SPLITTING', we assure
> that there will be a SCP for it.
> But, if the server with the meta region crashed before master restarts, and
> if all the procedure wals are lost (due to bug, or deleted manually,
> whatever), the new restarted master will be stuck when initing. Since no one
> will bring meta region online.
> Although it is an anomaly case, but I think no matter what happens, we need
> to online meta region. Otherwise, we are sitting ducks, noting can be done.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)