Ravi Kishore Valeti created HBASE-28068:
-------------------------------------------

             Summary: Normalizer should batch merging 0 sized regions
                 Key: HBASE-28068
                 URL: https://issues.apache.org/jira/browse/HBASE-28068
             Project: HBase
          Issue Type: Improvement
          Components: Normalizer
    Affects Versions: 2.5.5
            Reporter: Ravi Kishore Valeti
             Fix For: 2.6.0, 3.0.0


In our production environment, while investigating an issue, we observed that 
the Noramlizer had scheduled one single merge procedure to an RS providing 27K+ 
empty regions of a table (this was a result of a failed copy table job that 
left 27K+ empty regions of the table) to merge.

This action led the procedure to go to stuck state and eventually the procedure 
framework bailed out after ~40mins. This was happening with each normalizer run 
until we deleted the table manually.


Logs

Normalizer triggers a merge procedure

normalizer.RegionNormalizerWorker - NormalizationTarget[regionInfo=\{ENCODED => 
6e8606335a62f6bafceb017dc7edfdf5, NAME => 'TEST.TEST_TABLE,XXXX.', STARTKEY => 
'XXXX', ENDKEY => 'YYYY'},{*}regionSizeMb=0{*}], 
NormalizationTarget[regionInfo=\{ENCODED => 79607df308d7618e632abe8a12c1bf6b, 
NAME => 'TEST.TEST_TABLE,XXXX', STARTKEY => 'XXYY', ENDKEY => 
'YYZZ'},{*}regionSizeMb=0]{*}]] resulting in *pid 21968356*

procedure immediately gets stuck

procedure2.ProcedureExecutor - Worker *stuck* PEWorker-56(pid=21968356), run 
time 12.4850 sec

Finally fails after ~40 mins

procedure2.ProcedureExecutor - Worker *stuck* PEWorker-56(pid=21968356), run 
time *40 mins, 58.055 sec*


Bails out with RuntimeException

procedure2.ProcedureExecutor - force=false
java.lang.UnsupportedOperationException: pid=21968356, 
state=FAILED:MERGE_TABLE_REGIONS_UPDATE_META, locked=true, 
exception=java.lang.{*}RuntimeException via CODE-BUG: Uncaught runtime 
exception{*}: pid=21968356, state=RUNNABLE:MERGE_TABLE_REGIONS_UPDATE_META, 
locked=true; MergeTableRegionsProcedure table=TEST.TEST_TABLEXXXX, 
{*}regions={*}{*}[269a1b168af497cce9ba6d3d581568f2{*}
.
.
.
.
*27K+ regions printed here]*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to