[ 
https://issues.apache.org/jira/browse/HBASE-24255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17095136#comment-17095136
 ] 

Huaxiang Sun commented on HBASE-24255:
--------------------------------------

Thanks [~timoha] for explaining. I was excited and jumped too quick to a 
conclusion, sorry for the noise. Here is my understanding what happened, please 
correct me if it is wrong.
 # parent regions were already merged, but somehow the merge*** qualifers were 
not cleaned up from new merged child region in meta table (maybe master crashed 
before GCMultipleMergedRegionsProcedure is started).
 # Hbck2's addMissingRegionsInMeta onlined parent regions and they got 
assigned/opened region servers.
 # Catalog Janitor's cleanMergeRegion() kicks off 
GCMultipleMergedRegionsProcedure, which assumes that parent regions are already 
closed and deletes entries from meta table/archive regions in fs.

      IMO, at step 2, addMissingRegionsInMeta, it needs to be check if a region 
is a merged parent region, if it is, it aborts the operation.

    At step 3, inside GCMultipleMergedRegionsProcedure, it also needs to do 
sanity check to make sure parent regions are not online (a bit ugly). 

 

 

 

 

> GCRegionProcedure doesn't assign region from RegionServer leading to orphans
> ----------------------------------------------------------------------------
>
>                 Key: HBASE-24255
>                 URL: https://issues.apache.org/jira/browse/HBASE-24255
>             Project: HBase
>          Issue Type: Bug
>          Components: proc-v2, Region Assignment, regionserver
>    Affects Versions: 2.2.4
>         Environment: hbase 2.2.4
> hadoop 3.1.3
>            Reporter: Andrey Elenskiy
>            Assignee: niuyulin
>            Priority: Major
>
> We've found ourselves in a situation where parents of merged or split regions 
> needed to be opened again on a regionserver due to having to recover from 
> cluster meltdown (HBCK2's fixMeta kicks off GCMultipleMergedRegionsProcedure 
> which requiters all regions to be merged to be open). Then, when a 
> GCProcedure is kicked of to clean a parent region up by 
> GCMultipleMergedRegionsProcedure, it ends up deleting it from hbase:meta, but 
> doesn't unassign it from RegionServer leading for it to show up in "Orphan 
> Regions on RegionServer" in hbck tab of HBase Master. Also, the hbase client 
> doesn't detect that the region is closed either because it's still 
> technically open on a regionserver (it doesn't reread hbase:meta all the 
> time). The only way to recover from this is to restart regionserver which 
> isn't idea as it can lead to other issues in clusters with region 
> inconsistencies.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to