[
https://issues.apache.org/jira/browse/HBASE-21745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16889062#comment-16889062
]
stack commented on HBASE-21745:
-------------------------------
A few thoughts on remaining items:
* Fix region holes, overlaps, and other errors in the region chain
* Fix failed split and merge transactions that have failed to roll back due to
some bug (related to previous)
There are holes and overlaps in hbase:meta and then there are holes and
overlaps in the filesystem (hdfs). In the past, hbck1 would fix 'holes and
overlaps' in hdfs.... then hbase:meta would be consulted and adjusted to pick
up the hdfs changes. Lets not do it this way for hbck2 (Caveat HBASE-22567
which finds hbase:meta holes and if an hdfs region, hoists it up into
hbase;meta). In hbck2, perhaps the Master itself can see 'holes' and 'overlaps'
in hbase:meta. Master already runs a process on a period to ‘check’ hbase:meta
called CatalogJanitor. It could minimally report holes and overlaps (as well as
unknown servers, etc.). I was going to have a look at doing this. CJ could
report to the UI its findings (after the [~zghaobac] new tendency)
What about leftover directories in hdfs? Orphans and broken regions or broken
tables? In hdfs, hbck1 used to have the notion of 'adoption' where a new region
was created in a target table and the 'orphan' region's content was copied into
the new location. Thereafter, there'd be machinations to get the new region up
into hbase:meta. What if we ran an 'adoption service' in the Master where hbck2
would pass the Master a list of directories and tell the Master to 'adopt' the
content whether files or dropped regions, overlapping dirs, or even tables? The
Master's hbase:meta would have to be healthy first so new data had a home to go
to.
On fix split and merge transactions, this category of issues we should roll up
into the general master fix described above where something like CJ recognizes
any problem (it already does a bunch of the heavy-lifting for split/merges).
The 'HBASE-21965
Fix failed split and merge transactions that have failed to roll back' "fix"
above has actually been undone for now in favor of "HBASE-22709 Add a web ui to
show the failed splited/merged regions" whose intent is listing in UI
split/merges with recipes for fix.
And then perhaps a release of hbase-operator-tools?
> Make HBCK2 be able to fix issues other than region assignment
> -------------------------------------------------------------
>
> Key: HBASE-21745
> URL: https://issues.apache.org/jira/browse/HBASE-21745
> Project: HBase
> Issue Type: Umbrella
> Components: hbase-operator-tools, hbck2
> Reporter: Duo Zhang
> Assignee: stack
> Priority: Critical
>
> This is what [~apurtell] posted on mailing-list, HBCK2 should support
> * -Rebuild meta from region metadata in the filesystem, aka offline meta
> rebuild.-
> * -Fix assignment errors (undeployed regions, double assignments (yes,
> should not be possible), etc)- (See
> https://issues.apache.org/jira/browse/HBASE-21745?focusedCommentId=16888302&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16888302)
> * Fix region holes, overlaps, and other errors in the region chain
> * Fix failed split and merge transactions that have failed to roll back due
> to some bug (related to previous)
> * -Enumerate store files to determine file level corruption and sideline
> corrupt files-
> * -Fix hfile link problems (dangling / broken)-
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)