[
https://issues.apache.org/jira/browse/HBASE-21035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16616711#comment-16616711
]
Duo Zhang commented on HBASE-21035:
-----------------------------------
OK I met the same problem with you sir [~stack]. When testing sync replication,
the table has not been created(or may be deleted?) at the STANDBY cluster and
when transiting the peer to DA the procedure keeps retrying and generated bunch
of procedure wals. I created the table and restart the master, and then the
master is stuck in loading the procedures. I see that, we call recoverLease on
every wal file, sequentially, by one thread, this is really slow... Maybe we
need to find a way to speed up it...
> Meta Table should be able to online even if all procedures are lost
> -------------------------------------------------------------------
>
> Key: HBASE-21035
> URL: https://issues.apache.org/jira/browse/HBASE-21035
> Project: HBase
> Issue Type: Sub-task
> Affects Versions: 2.1.0
> Reporter: Allan Yang
> Assignee: Allan Yang
> Priority: Major
> Attachments: HBASE-21035.branch-2.0.001.patch,
> HBASE-21035.branch-2.1.001.patch
>
>
> After HBASE-20708, we changed the way we init after master starts. It will
> only check WAL dirs and compare to Zookeeper RS nodes to decide which server
> need to expire. For servers which's dir is ending with 'SPLITTING', we assure
> that there will be a SCP for it.
> But, if the server with the meta region crashed before master restarts, and
> if all the procedure wals are lost (due to bug, or deleted manually,
> whatever), the new restarted master will be stuck when initing. Since no one
> will bring meta region online.
> Although it is an anomaly case, but I think no matter what happens, we need
> to online meta region. Otherwise, we are sitting ducks, noting can be done.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)