[
https://issues.apache.org/jira/browse/HBASE-22300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16824540#comment-16824540
]
Biju Nair edited comment on HBASE-22300 at 4/23/19 9:15 PM:
------------------------------------------------------------
One obvious issue is with the time it takes for all the cost functions
extending
[CostFromRegionLoadFunction|https://github.com/apache/hbase/blob/baf3ae80f5588ee848176adefc9f56818458a387/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java#L1371].
Each time when cost is calculated in each "step", it loops through all the
regions in [all the servers in the
cluster|https://github.com/apache/hbase/blob/baf3ae80f5588ee848176adefc9f56818458a387/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java#L1398-L1410]
i.e. number of server * regions per server * number of cost functions which is
currently 5. A quick improvement will be to calculate the cost for each server
during the init of the cost function and calculate the cost for the individual
servers based on the action in a step i.e region move, region assign etc and
use it to come-up with the over all cost.
was (Author: gsbiju):
One obvious issue is with the time it takes for all the cost functions
extending
[CostFromRegionLoadFunction|https://github.com/apache/hbase/blob/baf3ae80f5588ee848176adefc9f56818458a387/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java#L1371].
Each time when cost is calculated in each step, it loops through all the
regions in [all the servers in the
cluster|https://github.com/apache/hbase/blob/baf3ae80f5588ee848176adefc9f56818458a387/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java#L1398-L1410]
i.e. number of server * regions per server * number of cost functions which is
currently 5. A quick improvement will be to calculate the cost for each server
during the init of the cost function and calculate the cost for the individual
servers based on the action in a step i.e region move, region assign etc and
use it to come-up with the over all cost.
> SLB doesn't perform well with increase in number of regions
> -----------------------------------------------------------
>
> Key: HBASE-22300
> URL: https://issues.apache.org/jira/browse/HBASE-22300
> Project: HBase
> Issue Type: Improvement
> Components: Balancer
> Reporter: Biju Nair
> Priority: Major
>
> With increase in number of regions in a cluster the number of steps taken by
> balancer in 30 sec (default balancer runtime) reduces noticeably. The
> following is the number of steps taken with by balancer with region loads set
> and running it without the loads being set i.e. cost functions using region
> loads are not fully exercised.
> {noformat}
> Nodes regions Tables # of steps # of steps
> with RS Load With no load
> 5 50 5 200000 200000
> 100 2000 110 104707 1000000
>
> 100 10000 40 19911 1000000
>
> 200 100000 400 870 1000000
> {noformat}
> As one would expect the reduced number of steps also makes the balancer take
> long time to get to an optimal cost. Note that only 2 data points were used
> in the region load histogram while in practice 15 region load data points are
> remembered.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)