[
https://issues.apache.org/jira/browse/HBASE-24139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Viraj Jasani resolved HBASE-24139.
----------------------------------
Fix Version/s: 1.7.0
2.3.0
3.0.0
Hadoop Flags: Reviewed
Resolution: Fixed
Pushed to master, branch-2, 2.3 and branch-1. Thanks for the contribution
[~bea0113]
> Balancer should avoid leaving idle region servers
> -------------------------------------------------
>
> Key: HBASE-24139
> URL: https://issues.apache.org/jira/browse/HBASE-24139
> Project: HBase
> Issue Type: Improvement
> Components: Balancer, Operability
> Reporter: Sean Busbey
> Assignee: Beata Sudi
> Priority: Critical
> Labels: beginner
> Fix For: 3.0.0, 2.3.0, 1.7.0
>
>
> After HBASE-15529 the StochasticLoadBalancer makes the decision to run based
> on its internal cost functions rather than the simple region count skew of
> BaseLoadBalancer.
> Given the default weights for those cost functions, the default minimum cost
> to indicate a need to rebalance, and a regions per region server density of
> ~90 we are not very responsive to adding additional region servers for
> non-trivial cluster sizes:
> * For clusters ~10 nodes, the defaults think a single RS at 0 regions means
> we need to balance
> * For clusters >20 nodes, the defaults will not consider a single RS at 0
> regions to mean we need to balance. 2 RS at 0 will cause it to balance.
> * For clusters ~100 nodes, having 6 RS with no regions will still not meet
> the threshold to cause a balance.
> Note that this is the decision to look at balancer plans at all. The
> calculation is severely dominated by the region count skew (it has weight 500
> and all other weights are ~105), so barring a very significant change in all
> other cost functions this condition will persist indefinitely.
> Two possible approaches:
> * add a new cost function that's essentially "don't have RS with 0 regions"
> that an operator can tune
> * add a short circuit condition for the {{needsBalance}} method that checks
> for empty RS similar to the check we do for colocated region replicas
> For those currently hitting this an easy work around is to set
> {{hbase.master.balancer.stochastic.minCostNeedBalance}} to {{0.01}}. This
> will mean that a single RS having 0 regions will cause the balancer to run
> for clusters of up to ~90 region servers. It's essentially the same as the
> default slop of 0.01 used by the BaseLoadBalancer.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)