[ 
https://issues.apache.org/jira/browse/HBASE-28533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17838531#comment-17838531
 ] 

Wellington Chevreuil commented on HBASE-28533:
----------------------------------------------

Thanks for reporting this and the detailed troubleshooting explanation. Can you 
confirm this also affects the newer releases as well? Also, please let me know 
if you plan to work on a fix to this, then we can get the jira assigned to you, 
[~droudy].

> Region split failure due to region quota limit leaves Hmaster's in memory 
> state for the region in SPLITTING after procedure rollback
> ------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-28533
>                 URL: https://issues.apache.org/jira/browse/HBASE-28533
>             Project: HBase
>          Issue Type: Bug
>          Components: Region Assignment
>    Affects Versions: 2.5.8
>         Environment: HBase Version 2.5.8, 
> r37444de6531b1bdabf2e445c83d0268ab1a6f919, Thu Feb 29 15:37:32 PST 2024
>            Reporter: Daniel Roudnitsky
>            Priority: Major
>
> When a SplitTableRegionProcedure is run for a region whose namespace is at 
> its maximum region quota limit, the split procedure will fail and rollback, 
> and Hmaster's in memory RegionStateNode for the region is left in a SPLITTING 
> state. Hmaster will then refuse to start any subsequent merge/split/move 
> procedures for that region because it believes the region is not OPEN, until 
> it is restarted and the in memory record of region states is reset.
> In the first step of the split procedure SPLIT_TABLE_REGION_PREPARE the 
> parent region's RegionStateNode state is set to SPLITTING, and the transition 
> is not written to the meta table. In the next step 
> SPLIT_TABLE_REGION_PRE_OPERATION the region quota check is done, 
> QuotaExceededException is thrown and the procedure ends in ROLLEDBACK state 
> without reverting the RegionStateNode back to OPEN state. Hmaster is left 
> believing the region is in a SPLITTING state according to its in memory 
> RegionStates, while the region is still online on the assigned region server 
> and according to meta.
> To reproduce in HBase shell:
> {code:java}
> > create_namespace 'test_ns', {'hbase.namespace.quota.maxregions'=> 2}
> > create 'test_ns:test_table', 'f1', {NUMREGIONS => 2, SPLITALGO => 
> > 'UniformSplit'}
> > region_a = <first region from list_regions 'test_ns:test_table'>
> > region_b = <second region from list_regions 'test_ns:test_table'>
> > split region_a, 'x'
> # HMaster will report: 
> pid=405, state=ROLLEDBACK, 
> exception=org.apache.hadoop.hbase.quotas.QuotaExceededException via 
> master-split-regions:org.apache.hadoop.hbase.quotas.QuotaExceededException: 
> Region split not possible for :<region_a> as quota limits are exceeded ; 
> SplitTableRegionProcedure table=test_ns:test_table, parent=...
> > merge_region region_a, region_b
> ERROR: org.apache.hadoop.hbase.exceptions.MergeRegionException: 
> org.apache.hadoop.hbase.client.DoNotRetryRegionException: <region_a> is not 
> OPEN; state=SPLITTING
> > stop_master # trigger hmaster failover 
> > merge_region region_a, region_b # merge now succeeds {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to