Ravi Kishore Valeti created HBASE-28068:
-------------------------------------------
Summary: Normalizer should batch merging 0 sized regions
Key: HBASE-28068
URL: https://issues.apache.org/jira/browse/HBASE-28068
Project: HBase
Issue Type: Improvement
Components: Normalizer
Affects Versions: 2.5.5
Reporter: Ravi Kishore Valeti
Fix For: 2.6.0, 3.0.0
In our production environment, while investigating an issue, we observed that
the Noramlizer had scheduled one single merge procedure to an RS providing 27K+
empty regions of a table (this was a result of a failed copy table job that
left 27K+ empty regions of the table) to merge.
This action led the procedure to go to stuck state and eventually the procedure
framework bailed out after ~40mins. This was happening with each normalizer run
until we deleted the table manually.
Logs
Normalizer triggers a merge procedure
normalizer.RegionNormalizerWorker - NormalizationTarget[regionInfo=\{ENCODED =>
6e8606335a62f6bafceb017dc7edfdf5, NAME => 'TEST.TEST_TABLE,XXXX.', STARTKEY =>
'XXXX', ENDKEY => 'YYYY'},{*}regionSizeMb=0{*}],
NormalizationTarget[regionInfo=\{ENCODED => 79607df308d7618e632abe8a12c1bf6b,
NAME => 'TEST.TEST_TABLE,XXXX', STARTKEY => 'XXYY', ENDKEY =>
'YYZZ'},{*}regionSizeMb=0]{*}]] resulting in *pid 21968356*
procedure immediately gets stuck
procedure2.ProcedureExecutor - Worker *stuck* PEWorker-56(pid=21968356), run
time 12.4850 sec
Finally fails after ~40 mins
procedure2.ProcedureExecutor - Worker *stuck* PEWorker-56(pid=21968356), run
time *40 mins, 58.055 sec*
Bails out with RuntimeException
procedure2.ProcedureExecutor - force=false
java.lang.UnsupportedOperationException: pid=21968356,
state=FAILED:MERGE_TABLE_REGIONS_UPDATE_META, locked=true,
exception=java.lang.{*}RuntimeException via CODE-BUG: Uncaught runtime
exception{*}: pid=21968356, state=RUNNABLE:MERGE_TABLE_REGIONS_UPDATE_META,
locked=true; MergeTableRegionsProcedure table=TEST.TEST_TABLEXXXX,
{*}regions={*}{*}[269a1b168af497cce9ba6d3d581568f2{*}
.
.
.
.
*27K+ regions printed here]*
--
This message was sent by Atlassian Jira
(v8.20.10#820010)