[
https://issues.apache.org/jira/browse/HBASE-28068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764326#comment-17764326
]
Viraj Jasani edited comment on HBASE-28068 at 9/12/23 6:07 PM:
---------------------------------------------------------------
In fact, the config limit can be applied during plan computation (i.e.
{_}computeMergeNormalizationPlans(){_}).
For instance, we can limit the size of rangeMembers here:
{code:java}
...
...
...
if (
rangeMembers.isEmpty() // when there are no range members, seed the range
with whatever
// we have. this way we're prepared in case the next
region is
// 0-size.
|| (rangeMembers.size() == 1 && sumRangeMembersSizeMb == 0) // when there
is only one
// region and
the size is 0,
// seed the
range with
// whatever we
have.
|| regionSizeMb == 0 // always add an empty region to the current range.
|| (regionSizeMb + sumRangeMembersSizeMb <= avgRegionSizeMb)
) { // add the current region
// to the range when
// there's capacity
// remaining.
rangeMembers.add(new NormalizationTarget(regionInfo, regionSizeMb));
sumRangeMembersSizeMb += regionSizeMb;
continue;
}
...
...
... {code}
If the configured limit is higher than {_}rangeMembers.size(){_}, we don't need
to compute any further.
Though this is for merge plan, this might also be improved in general and be
applicable to _computeSplitNormalizationPlans()_ as well.
was (Author: vjasani):
In fact, the config limit can be applied during plan computation (i.e.
{_}computeMergeNormalizationPlans(){_}).
For instance, we can limit the size of rangeMembers here:
{code:java}
...
...
...
if (
rangeMembers.isEmpty() // when there are no range members, seed the range
with whatever
// we have. this way we're prepared in case the next
region is
// 0-size.
|| (rangeMembers.size() == 1 && sumRangeMembersSizeMb == 0) // when there
is only one
// region and
the size is 0,
// seed the
range with
// whatever we
have.
|| regionSizeMb == 0 // always add an empty region to the current range.
|| (regionSizeMb + sumRangeMembersSizeMb <= avgRegionSizeMb)
) { // add the current region
// to the range when
// there's capacity
// remaining.
rangeMembers.add(new NormalizationTarget(regionInfo, regionSizeMb));
sumRangeMembersSizeMb += regionSizeMb;
continue;
}
...
...
... {code}
If the configured limit is higher than {_}rangeMembers.size(){_}, we don't need
to compute any further. This is for merge plan, this might be improved in
general as well.
> Normalizer should batch merging 0 sized/empty regions
> -----------------------------------------------------
>
> Key: HBASE-28068
> URL: https://issues.apache.org/jira/browse/HBASE-28068
> Project: HBase
> Issue Type: Improvement
> Components: Normalizer
> Affects Versions: 2.5.5
> Reporter: Ravi Kishore Valeti
> Assignee: Rahul Kumar
> Priority: Minor
> Fix For: 2.6.0, 2.5.6, 3.0.0-beta-1
>
>
> In our production environment, while investigating an issue, we observed that
> the Noramlizer had scheduled one single merge procedure to an RS providing
> 27K+ empty regions of a table (this was a result of a failed copy table job
> that left 27K+ empty regions of the table) to merge.
> This action led the procedure to go to stuck state and eventually the
> procedure framework bailed out after ~40mins. This was happening with each
> normalizer run until we deleted the table manually.
> Logs
> Normalizer triggers a merge procedure
> normalizer.RegionNormalizerWorker - NormalizationTarget[regionInfo=\{ENCODED
> => 6e8606335a62f6bafceb017dc7edfdf5, NAME => 'TEST.TEST_TABLE,XXXX.',
> STARTKEY => 'XXXX', ENDKEY => 'YYYY'},{*}regionSizeMb=0{*}],
> NormalizationTarget[regionInfo=\{ENCODED => 79607df308d7618e632abe8a12c1bf6b,
> NAME => 'TEST.TEST_TABLE,XXXX', STARTKEY => 'XXYY', ENDKEY =>
> 'YYZZ'},{*}regionSizeMb=0]{*}]] resulting in *pid 21968356*
> procedure immediately gets stuck
> procedure2.ProcedureExecutor - Worker *stuck* PEWorker-56(pid=21968356), run
> time 12.4850 sec
> Finally fails after ~40 mins
> procedure2.ProcedureExecutor - Worker *stuck* PEWorker-56(pid=21968356), run
> time *40 mins, 58.055 sec*
> Bails out with RuntimeException
> procedure2.ProcedureExecutor - force=false
> java.lang.UnsupportedOperationException: pid=21968356,
> state=FAILED:MERGE_TABLE_REGIONS_UPDATE_META, locked=true,
> exception=java.lang.{*}RuntimeException via CODE-BUG: Uncaught runtime
> exception{*}: pid=21968356, state=RUNNABLE:MERGE_TABLE_REGIONS_UPDATE_META,
> locked=true; MergeTableRegionsProcedure table=TEST.TEST_TABLEXXXX,
> {*}regions={*}{*}[269a1b168af497cce9ba6d3d581568f2{*}
> .
> .
> .
> .
> *27K+ regions printed here]*
--
This message was sent by Atlassian Jira
(v8.20.10#820010)