[
https://issues.apache.org/jira/browse/HBASE-24255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17095956#comment-17095956
]
Andrey Elenskiy commented on HBASE-24255:
-----------------------------------------
[~huaxiangsun] yes, you got the idea right.
> but somehow the merge*** qualifers were not cleaned up from new merged child
> region in meta table (maybe master crashed before
> GCMultipleMergedRegionsProcedure is started)
That's due to HBASE-24273 actually, addMissingRegionsInMeta will read those
"orphans" without checking that merge qualifier exists. I think fixing
HBASE-24273 will resolve this particular instance.
But I'm still wondering if there are other situations where GCRegionProcedure
should also make sure that region is unassigned from regionserver and it would
be more geneirc as I've seen it happen even without region merges (I don't
recall the case anymore).
> GCRegionProcedure doesn't assign region from RegionServer leading to orphans
> ----------------------------------------------------------------------------
>
> Key: HBASE-24255
> URL: https://issues.apache.org/jira/browse/HBASE-24255
> Project: HBase
> Issue Type: Bug
> Components: proc-v2, Region Assignment, regionserver
> Affects Versions: 2.2.4
> Environment: hbase 2.2.4
> hadoop 3.1.3
> Reporter: Andrey Elenskiy
> Assignee: niuyulin
> Priority: Major
>
> We've found ourselves in a situation where parents of merged or split regions
> needed to be opened again on a regionserver due to having to recover from
> cluster meltdown (HBCK2's fixMeta kicks off GCMultipleMergedRegionsProcedure
> which requiters all regions to be merged to be open). Then, when a
> GCProcedure is kicked of to clean a parent region up by
> GCMultipleMergedRegionsProcedure, it ends up deleting it from hbase:meta, but
> doesn't unassign it from RegionServer leading for it to show up in "Orphan
> Regions on RegionServer" in hbck tab of HBase Master. Also, the hbase client
> doesn't detect that the region is closed either because it's still
> technically open on a regionserver (it doesn't reread hbase:meta all the
> time). The only way to recover from this is to restart regionserver which
> isn't idea as it can lead to other issues in clusters with region
> inconsistencies.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)