[ 
https://issues.apache.org/jira/browse/HBASE-25882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaolin Ha updated HBASE-25882:
-------------------------------
    Description: 
When using balance by table, the StochasticLoadBalancer will create the cluster 
state according to the regions distribution of only one table. As a result,  
the TableSkewCostFunction should be replaced by the RegionCountSkewCostFunction 
when the table count of the cluster state is less than 2.

The most important problem is that,  TableSkewCostFunction will cause 
unnecessary calculation steps when there is only one table.

For example,we have 40000+ regions for one table in one group, balance this 
table may calculate lots of steps, but we can avoid duplicate computing costs 
of TableSkewCostFunction while computing cost of RegionCountSkewCostFunction 
and balancing by table.

 

  !image-2021-08-17-20-20-12-768.png|width=907,height=164!

  was:
When using balance by table, the StochasticLoadBalancer will create the cluster 
state according to the regions distribution of only one table. As a result,  
the TableSkewCostFunction should be replaced by the RegionCountSkewCostFunction 
when the table count of the cluster state is less than 2.

The most important problem is that,  TableSkewCostFunction will cause 
unnecessary calculation steps when there is only one table. The cost it 
computed may be incorrect.

For example, there are 5 online regionservers, and there is only one table with 
exactly one region, the cluster state is [0,0,0,0,1]. Then the cost of 
TableSkewCostFunction will be 1 (expect value is 0), because max=1, min=0.25, 
value=1. And the computedMaxSteps will be larger than 0, some balance actions 
will be generated to decrease the cost. But all the actions is meaningless for 
the skew count.

 


> TableSkewCostFunction may cost unnecessary calculation steps when balancing 
> by table
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-25882
>                 URL: https://issues.apache.org/jira/browse/HBASE-25882
>             Project: HBase
>          Issue Type: Improvement
>          Components: Balancer
>    Affects Versions: 3.0.0-alpha-1, 2.0.0
>            Reporter: Xiaolin Ha
>            Assignee: Xiaolin Ha
>            Priority: Major
>         Attachments: image-2021-08-17-20-20-12-768.png
>
>
> When using balance by table, the StochasticLoadBalancer will create the 
> cluster state according to the regions distribution of only one table. As a 
> result,  the TableSkewCostFunction should be replaced by the 
> RegionCountSkewCostFunction when the table count of the cluster state is less 
> than 2.
> The most important problem is that,  TableSkewCostFunction will cause 
> unnecessary calculation steps when there is only one table.
> For example,we have 40000+ regions for one table in one group, balance this 
> table may calculate lots of steps, but we can avoid duplicate computing costs 
> of TableSkewCostFunction while computing cost of RegionCountSkewCostFunction 
> and balancing by table.
>  
>   !image-2021-08-17-20-20-12-768.png|width=907,height=164!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to