[
https://issues.apache.org/jira/browse/HBASE-27496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17636005#comment-17636005
]
Bryan Beaudreault edited comment on HBASE-27496 at 11/18/22 7:43 PM:
---------------------------------------------------------------------
One way around that problem – Currently you are calculating the total region or
total range size based on the hri and rangeMembers respectively, and then
bowing out of the calculation early if the total is exceeded. You could instead
add a getPlanSize() to NormalizationPlan, and store that calculated size on the
returned plan objects.
Then you'd use that to enforce the limit _after_ computing all plans. Shuffle
all of the accumulated plans at the end, then iterate them calling the new
getPlanSize() until the total threshold is reached.
NormalizationPlan interface and implementations are all
InterfaceAudience.Private so should be easy to modify as necessary.
edit: realized I had left some partially complete and confusing sentences in
there.
was (Author: bbeaudreault):
One way around that problem – Currently you are calculating the total region or
total range size based on the hri and rangeMembers respectively. Both of those
are actually being set into the returned SplitNormalizationPlan and
MergeNormalizationPlan and store your calculated planSize onto it.
What you could do is enforce the limit _after_ computing all plans. Add a
getPlanSize() method to the NormalizationPlan interface. Shuffle all of the
accumulated plans, then iterate them calling the new getPlanSize() until the
total threshold is reached.
NormalizationPlan interface and implementations are all
InterfaceAudience.Private so should be easy to modify as necessary.
> Limit size of plans produced by SimpleRegionNormalizer
> ------------------------------------------------------
>
> Key: HBASE-27496
> URL: https://issues.apache.org/jira/browse/HBASE-27496
> Project: HBase
> Issue Type: Improvement
> Components: Normalizer
> Reporter: Charles Connell
> Priority: Minor
>
> My company (Hubspot) is starting to use {{{}SimpleRegionNormalizer{}}}. We
> turn the normalizer switch on for 30 minutes each day, when our database
> traffic is at a low point. We're using the
> {{hbase.normalizer.throughput.max_bytes_per_sec}} setting to create a rate
> limit. I've found that while the {{SimpleRegionNormalizer}} only produces new
> plans for 30 minutes each day, the plans often take many hours to execute.
> This leds to region splits, merges, and moves occurring in our HBase clusters
> during hours we'd prefer them not to.
> I propose two new settings:
> * {{hbase.normalizer.merge.plans_size_limit.mb}}
> * {{hbase.normalizer.split.plans_size_limit.mb}}
> This will allow HBase administrators to limit the number of plans produced by
> a run of {{{}SimpleRegionNormalizer{}}}, by forcing it to stop producing new
> plans once the cumulative region size limits are exceeded. This will give you
> a way to limit approximately how long it takes to execute the plans. Because
> the current limit to execute plans is primarily determined by a per-byte rate
> limit, I propose that the new settings also work on a similar basis. This
> will make it feasible to reason about how your rate limit and your size
> limits interact.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)