[ 
https://issues.apache.org/jira/browse/HBASE-25625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Clara Xiong updated HBASE-25625:
--------------------------------
    Description: 
Currently CostFunctions including RegionCountSkewCostFunctions, 
PrimaryRegionCountSkewCostFunctions and all load cost functions calculate the 
unevenness of the distribution by getting the sum of deviation per region 
server. This simple implementation works when the cluster is small. But when 
the cluster get larger with more region servers and regions, it doesn't work 
well with hot spots or a small number of unbalanced servers. The proposal is to 
use the standard deviation of the count per region server to capture the 
existence of a small portion of region servers with overwhelming 
load/allocation.

TableSkewCostFunction uses the sum of the max deviation region per server for 
all tables as the measure of unevenness. It doesn't work in a very common 
scenario in operations. Say we have 100 regions on 50 nodes, two on each. We 
add 50 new nodes and they have 0 each. The max deviation from the mean is 1, 
compared to 99 in the worst case scenario of 100 regions on a single server. 
The normalized cost is 1/99 = 0.011 < default threshold of 0.05. Balancer 
wouldn't move.  The proposal is to use the standard deviation of the count per 
region server to detect this scenario, generating a cost of 3.1/31 = 0.1 in 
this case.

Patch is in test and will follow shortly.

  was:
Currently CostFunctions including RegionCountSkewCostFunctions, 
PrimaryRegionCountSkewCostFunctions and all load cost functions calculate how 
uneven the distribution by getting the sum of deviation per region server. 
TableSkewCostFunction uses the sum of the max region per server for all tables 
as the measure of unevenness. 

This simple implementation works when the cluster is small. But when the 
cluster get larger with more region servers and regions, it doesn't work well 
with hot spots or a small number of unbalanced servers.

The proposal is to use the standard deviation of the count per region server to 
capture the existence of a small portion of region servers with overwhelming 
load/allocation.

Patch is in test and will follow shortly.


> StochasticBalancer CostFunctions needs a better way to evaluate resource 
> distribution
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-25625
>                 URL: https://issues.apache.org/jira/browse/HBASE-25625
>             Project: HBase
>          Issue Type: Improvement
>          Components: Balancer, master
>            Reporter: Clara Xiong
>            Assignee: Clara Xiong
>            Priority: Major
>
> Currently CostFunctions including RegionCountSkewCostFunctions, 
> PrimaryRegionCountSkewCostFunctions and all load cost functions calculate the 
> unevenness of the distribution by getting the sum of deviation per region 
> server. This simple implementation works when the cluster is small. But when 
> the cluster get larger with more region servers and regions, it doesn't work 
> well with hot spots or a small number of unbalanced servers. The proposal is 
> to use the standard deviation of the count per region server to capture the 
> existence of a small portion of region servers with overwhelming 
> load/allocation.
> TableSkewCostFunction uses the sum of the max deviation region per server for 
> all tables as the measure of unevenness. It doesn't work in a very common 
> scenario in operations. Say we have 100 regions on 50 nodes, two on each. We 
> add 50 new nodes and they have 0 each. The max deviation from the mean is 1, 
> compared to 99 in the worst case scenario of 100 regions on a single server. 
> The normalized cost is 1/99 = 0.011 < default threshold of 0.05. Balancer 
> wouldn't move.  The proposal is to use the standard deviation of the count 
> per region server to detect this scenario, generating a cost of 3.1/31 = 0.1 
> in this case.
> Patch is in test and will follow shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to