[
https://issues.apache.org/jira/browse/HBASE-19121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16607502#comment-16607502
]
stack commented on HBASE-19121:
-------------------------------
h2. Horror Story
Big cluster. Lots of regions. A couple of STUCK procedures that prevent
clean-up of old WALs. A backlog builds. Master crashes (for some unrelated
reason). New Master tries to become active Master. It reads outstanding
MasterProcWAL logs to reconstruct assignment. If a large backlog, this can take
hours.
HBASE-21165 describes an instance where 700servers and 420k regions. The Master
is taking hours to put together assignment again from backed-up logs (~300 and
I think a few million procedures). HBASE-21165 is adding emitting state because
otherwise it looks like we are hung.
Need to support remove of all MasterProcWAL and come up anyways as per notes
above.
> HBCK for AMv2 (A.K.A HBCK2)
> ---------------------------
>
> Key: HBASE-19121
> URL: https://issues.apache.org/jira/browse/HBASE-19121
> Project: HBase
> Issue Type: Bug
> Components: hbck
> Reporter: stack
> Assignee: Umesh Agashe
> Priority: Major
> Attachments: hbase-19121.master.001.patch
>
>
> We don't have an hbck for the new AM. Old hbck may actually do damage going
> against AMv2.
> Fix.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)