[
https://issues.apache.org/jira/browse/HBASE-19121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16585114#comment-16585114
]
stack commented on HBASE-19121:
-------------------------------
Chatting with [~allan163] and [~Apache9], major concern is loss of master proc
wals. If gone, mis-deleted, or damaged, then the cluster is hosed. Can't have
this. Redundancy? How to have redundant master proc WAL? Or can we leave
breadcrumbs as we used to try in hbck1 days that allow us rebuild if all is
trashed? How? We have some file-based droppings. Will use for now though we
would like to move away from depending on particularities of our fs persist.
For hbase2, minimally:
* A rebuild procedure that can put cluster back together after catastrophy.
Rebuild procedure might be composed of multiple fix-it procedures that an
operator would run via hbck2. hbck2 would require at least a minimal Master
running ("maintenance mode"). Best if no dependency on RSs.
* But only ever one master at a time! Even if a mimimal.
* One procedure would repair meta. It would work though minimal master. It
would look for meta WAL logs for recovery. It'd run splitting inline rather
than try farm it out to cluster to minimize dependency on RS's being up. It'd
dump the recovered.edits into place. It might then open the the meta region
for hbck2 to read.
* hbck2 would make report of the troublesome....RITs. Or unfinished split or
merge.
* A procedure to look for -SPLITTING RS dirs for queuing new SCPs.
Other hbck2 features:
* Move aside the master proc wals.
* Force complete of a procedure. Can't kill Procedures. Rollback doesn't always
work. Procedures maybe subprocedures. Need to have them complete so parent can
complete. Then operator does fixup. When force complete, need to release locks
too... else operator or new procedures to fix cannot make progress.
> HBCK for AMv2 (A.K.A HBCK2)
> ---------------------------
>
> Key: HBASE-19121
> URL: https://issues.apache.org/jira/browse/HBASE-19121
> Project: HBase
> Issue Type: Bug
> Components: hbck
> Reporter: stack
> Assignee: Umesh Agashe
> Priority: Major
> Attachments: hbase-19121.master.001.patch
>
>
> We don't have an hbck for the new AM. Old hbck may actually do damage going
> against AMv2.
> Fix.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)