[
https://issues.apache.org/jira/browse/HBASE-25625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17295860#comment-17295860
]
David Manning commented on HBASE-25625:
---------------------------------------
I'm excited for working towards a balancer that works better for large
clusters! Thanks for proposing changes in that direction.
I agree that the TableSkewCostFunction seems limited in its current form of
only tracking the max regions on any given server.
For the other cost functions, I'm having a hard time working through the math
and seeing the benefit, though. For example, if I take an 11-node cluster with
100 regions per server on average:
100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100
And one node goes down, then I see:
110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 0
With sum of deviation (old computation), it is (110 - 100) * 10 + (100 - 0) * 1
= 200. Max deviation would be 1100 regions on one server, for (100 - 0) * 10 +
(1100 - 100) * 1 = 2000. So the scaled cost would be 200 / 2000 - 0.1.
With stdev (new computation), it also gives a scaled cost of 0.1. stdev =
sqrt(((110 - 100) ^ 2 * 10 + (0 - 100) ^ 2 * 1) / 11) = sqrt(1000). Maximum
possible stdev = sqrt(((0 - 100) ^ 2 * 10 + (1100 - 100) ^ 2 * 1) / 11) =
sqrt(100000).
If another server goes down and distributed regions round-robin, the cluster
state would look like:
121, 121, 121, 121, 121, 121, 121, 121, 121, 0, 11
If I did the math right, then I see:
old computation: 378 / 2000 = 0.189
new computation: 0.140
So the stdev-based calculation is less likely to balance in these scenarios.
How big does the cluster have to get to benefit from the new calculations? I
tried 100 nodes with 1000 regions per node. One node at 0 results in 0.01 cost
in both old and new calculations. Two nodes down (assuming round-robin
balancing again), gives me 0.019 for the old calculation and 0.014 for the new
stdev calculation.
> StochasticBalancer CostFunctions needs a better way to evaluate resource
> distribution
> -------------------------------------------------------------------------------------
>
> Key: HBASE-25625
> URL: https://issues.apache.org/jira/browse/HBASE-25625
> Project: HBase
> Issue Type: Improvement
> Components: Balancer, master
> Reporter: Clara Xiong
> Assignee: Clara Xiong
> Priority: Major
>
> Currently CostFunctions including RegionCountSkewCostFunctions,
> PrimaryRegionCountSkewCostFunctions and all load cost functions calculate the
> unevenness of the distribution by getting the sum of deviation per region
> server. This simple implementation works when the cluster is small. But when
> the cluster get larger with more region servers and regions, it doesn't work
> well with hot spots or a small number of unbalanced servers. The proposal is
> to use the standard deviation of the count per region server to capture the
> existence of a small portion of region servers with overwhelming
> load/allocation.
> TableSkewCostFunction uses the sum of the max deviation region per server for
> all tables as the measure of unevenness. It doesn't work in a very common
> scenario in operations. Say we have 100 regions on 50 nodes, two on each. We
> add 50 new nodes and they have 0 each. The max deviation from the mean is 1,
> compared to 99 in the worst case scenario of 100 regions on a single server.
> The normalized cost is 1/99 = 0.011 < default threshold of 0.05. Balancer
> wouldn't move. The proposal is to use the standard deviation of the count
> per region server to detect this scenario, generating a cost of 3.1/31 = 0.1
> in this case.
> Patch is in test and will follow shortly.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)