[ 
https://issues.apache.org/jira/browse/HBASE-25768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17532088#comment-17532088
 ] 

Xiaolin Ha commented on HBASE-25768:
------------------------------------

[~filtertip] , thanks, I think the main reason why disabled balanceByTable and 
increased multiplier for the cost of table skew work well is that the 
computedMaxSteps is far smaller in one round of balance cluster, it balances 
all the tables on the cluster in one round. 

But when enabling balanceByTable, balance cluster needs 
computedMaxSteps*tableCount in one round. It is time consuming, but it balanced 
more accurately, because it considers all the cost functions for all the table, 
instead of progressive convergence by balancing cluster time after time. Since 
all tables computed costs together, maybe you can not balance a pressure skew 
table when disabling balanceByTable.

Here if we use an overall checker to trigger coarse balance, it can increase 
the speed when the cluster/table has skew issues, because the time that 
calculate costs also should multiple the count of cost functions and time 
consuming of each cost functions.

> Support an overall coarse and fast balance strategy for StochasticLoadBalancer
> ------------------------------------------------------------------------------
>
>                 Key: HBASE-25768
>                 URL: https://issues.apache.org/jira/browse/HBASE-25768
>             Project: HBase
>          Issue Type: Improvement
>          Components: Balancer
>    Affects Versions: 3.0.0-alpha-1, 2.0.0, 1.4.13
>            Reporter: Xiaolin Ha
>            Assignee: Xiaolin Ha
>            Priority: Major
>
> When we use StochasticLoadBalancer + balanceByTable, we could face two 
> difficulties.
>  # For each table, their regions are distributed uniformly, but for the 
> overall cluster, still exiting imbalance between RSes;
>  # When there are large-scaled restart of RSes, or expansion for groups or 
> cluster, we hope the balancer can execute as soon as possible, but the 
> StochasticLoadBalancer may need a lot of time to compute costs.
> We can detect these circumstances in StochasticLoadBalancer(such as using the 
> percentage of skew tables), and before the normal balance steps trying, we 
> can add a strategy to let it just balance like the SimpleLoadBalancer or use 
> few light cost functions here.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to