Daniel Roudnitsky created HBASE-28533:
-----------------------------------------

             Summary: Region split failure due to region quota limit leaves 
Hmaster's in memory state for the region in SPLITTING after procedure rollback
                 Key: HBASE-28533
                 URL: https://issues.apache.org/jira/browse/HBASE-28533
             Project: HBase
          Issue Type: Bug
          Components: Region Assignment
    Affects Versions: 2.5.8
         Environment: HBase Version 2.5.8, 
r37444de6531b1bdabf2e445c83d0268ab1a6f919, Thu Feb 29 15:37:32 PST 2024
            Reporter: Daniel Roudnitsky


When a SplitTableRegionProcedure is run for a region whose namespace is at its 
maximum region quota limit, the split procedure will fail and rollback, and 
Hmaster's in memory RegionStateNode for the region is left in a SPLITTING 
state. Hmaster will then refuse to start any subsequent merge/split/move 
procedures for that region because it believes the region is not OPEN, until it 
is restarted and the in memory record of region states is reset.

In the first step of the split procedure SPLIT_TABLE_REGION_PREPARE the parent 
region's RegionStateNode state is set to SPLITTING, and the transition is not 
written to the meta table. In the next step SPLIT_TABLE_REGION_PRE_OPERATION 
the region quota check is done, QuotaExceededException is thrown and the 
procedure ends in ROLLEDBACK state without reverting the RegionStateNode back 
to OPEN state. Hmaster is left believing the region is in a SPLITTING state 
according to its in memory RegionStates, while the region is still online on 
the assigned region server and according to meta.

To reproduce in HBase shell:

{code:java}
> create_namespace 'test_ns', {'hbase.namespace.quota.maxregions'=> 2}
> create 'test_ns:test_table', 'f1', {NUMREGIONS => 2, SPLITALGO => 
> 'UniformSplit'}
> region_a = <first region from list_regions 'test_ns:test_table'>
> region_b = <second region from list_regions 'test_ns:test_table'>

> split region_a, 'x'
# HMaster will report: 
pid=405, state=ROLLEDBACK, 
exception=org.apache.hadoop.hbase.quotas.QuotaExceededException via 
master-split-regions:org.apache.hadoop.hbase.quotas.QuotaExceededException: 
Region split not possible for :<region_a> as quota limits are exceeded ; 
SplitTableRegionProcedure table=test_ns:test_table, parent=...

> merge_region region_a, region_b
ERROR: org.apache.hadoop.hbase.exceptions.MergeRegionException: 
org.apache.hadoop.hbase.client.DoNotRetryRegionException: <region_a> is not 
OPEN; state=SPLITTING

> stop_master # trigger hmaster failover 
> merge_region region_a, region_b # merge now succeeds {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to