[ 
https://issues.apache.org/jira/browse/HBASE-25625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17295860#comment-17295860
 ] 

David Manning commented on HBASE-25625:
---------------------------------------

I'm excited for working towards a balancer that works better for large 
clusters! Thanks for proposing changes in that direction.

I agree that the TableSkewCostFunction seems limited in its current form of 
only tracking the max regions on any given server.

For the other cost functions, I'm having a hard time working through the math 
and seeing the benefit, though. For example, if I take an 11-node cluster with 
100 regions per server on average:

100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100

And one node goes down, then I see:

110, 110, 110, 110, 110, 110, 110, 110, 110, 110, 0

With sum of deviation (old computation), it is (110 - 100) * 10 + (100 - 0) * 1 
= 200. Max deviation would be 1100 regions on one server, for (100 - 0) * 10 + 
(1100 - 100) * 1 = 2000. So the scaled cost would be 200 / 2000 - 0.1.

With stdev (new computation), it also gives a scaled cost of 0.1. stdev = 
sqrt(((110 - 100) ^ 2 * 10 + (0 - 100) ^ 2 * 1) / 11) = sqrt(1000). Maximum 
possible stdev = sqrt(((0 - 100) ^ 2 * 10 + (1100 - 100) ^ 2 * 1) / 11) = 
sqrt(100000).

If another server goes down and distributed regions round-robin, the cluster 
state would look like:

121, 121, 121, 121, 121, 121, 121, 121, 121, 0, 11

If I did the math right, then I see:

old computation: 378 / 2000 = 0.189

new computation: 0.140

So the stdev-based calculation is less likely to balance in these scenarios.

How big does the cluster have to get to benefit from the new calculations? I 
tried 100 nodes with 1000 regions per node. One node at 0 results in 0.01 cost 
in both old and new calculations. Two nodes down (assuming round-robin 
balancing again), gives me 0.019 for the old calculation and 0.014 for the new 
stdev calculation.

> StochasticBalancer CostFunctions needs a better way to evaluate resource 
> distribution
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-25625
>                 URL: https://issues.apache.org/jira/browse/HBASE-25625
>             Project: HBase
>          Issue Type: Improvement
>          Components: Balancer, master
>            Reporter: Clara Xiong
>            Assignee: Clara Xiong
>            Priority: Major
>
> Currently CostFunctions including RegionCountSkewCostFunctions, 
> PrimaryRegionCountSkewCostFunctions and all load cost functions calculate the 
> unevenness of the distribution by getting the sum of deviation per region 
> server. This simple implementation works when the cluster is small. But when 
> the cluster get larger with more region servers and regions, it doesn't work 
> well with hot spots or a small number of unbalanced servers. The proposal is 
> to use the standard deviation of the count per region server to capture the 
> existence of a small portion of region servers with overwhelming 
> load/allocation.
> TableSkewCostFunction uses the sum of the max deviation region per server for 
> all tables as the measure of unevenness. It doesn't work in a very common 
> scenario in operations. Say we have 100 regions on 50 nodes, two on each. We 
> add 50 new nodes and they have 0 each. The max deviation from the mean is 1, 
> compared to 99 in the worst case scenario of 100 regions on a single server. 
> The normalized cost is 1/99 = 0.011 < default threshold of 0.05. Balancer 
> wouldn't move.  The proposal is to use the standard deviation of the count 
> per region server to detect this scenario, generating a cost of 3.1/31 = 0.1 
> in this case.
> Patch is in test and will follow shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to