[ 
https://issues.apache.org/jira/browse/HBASE-25625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17294923#comment-17294923
 ] 

Clara Xiong commented on HBASE-25625:
-------------------------------------

Unit tests updated to reflect the new algorithm.

Tests performed on a 12 node cluster:
 # Outliers. I brought down a node and restarted 10 min later. new balancer 
worked perfectly.  The old computation would've ignored it by getting the 
measurement at 0.046 > the default threshold at 0.05. the new implementation 
computed the unbalance of region count per server at 0.078  > 0.05 and 
therefore moved the regions. 
 # Load functions: tested with majority weight given to load function such as 
storefiles. It also worked.

> StochasticBalancer CostFunctions needs a better way to evaluate resource 
> distribution
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-25625
>                 URL: https://issues.apache.org/jira/browse/HBASE-25625
>             Project: HBase
>          Issue Type: Improvement
>          Components: Balancer, master
>            Reporter: Clara Xiong
>            Assignee: Clara Xiong
>            Priority: Major
>         Attachments: Screen Shot 2021-03-03 at 12.01.12 PM.png, Screen Shot 
> 2021-03-03 at 12.08.58 PM.png
>
>
> Currently CostFunctions including RegionCountSkewCostFunctions, 
> PrimaryRegionCountSkewCostFunctions and all load cost functions calculate how 
> uneven the distribution by getting the sum of deviation per region server. 
> TableSkewCostFunction uses the sum of the max region per server for all 
> tables as the measure of unevenness. 
> This simple implementation works when the cluster is small. But when the 
> cluster get larger with more region servers and regions, it doesn't work well 
> with hot spots or a small number of unbalanced servers.
> The proposal is to use the standard deviation of the count per region server 
> to capture the existence of a small portion of region servers with 
> overwhelming load/allocation.
> Patch is in test and will follow shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to