Ray Mattingly created HBASE-29772:
-------------------------------------
Summary: The balancer is too slow with 100k regions and
conditionals enabled
Key: HBASE-29772
URL: https://issues.apache.org/jira/browse/HBASE-29772
Project: HBase
Issue Type: Improvement
Affects Versions: 2.6.4
Reporter: Ray Mattingly
Assignee: Ray Mattingly
We have some clusters with upwards of 100k regions. These clusters also use
system table isolation, meta table isolation, and read replica distribution
balancer conditionals. We're starting to hit some real slow downs in balancer
performance on clusters like this, particularly in the last mile of balancing
(where, for example, there may only be 1/300 servers that is particularly
under-utilized so random moves are very rarely a good one and moves are slow to
evaluate due to the scale)
I'd suggest that we make the
[SlopFixingCandidateGenerator|https://github.com/apache/hbase/blob/07de86938c58dfb627c1910f4f8db88d544b600e/hbase-balancer/src/main/java/org/apache/hadoop/hbase/master/balancer/SlopFixingCandidateGenerator.java#L34]
a default candidate generator which may be returned by `getRandomGenerator`.
This should help the balancer find quicker paths to, at least, "unsloppy"
balance before things begin to slow downÂ
--
This message was sent by Atlassian Jira
(v8.20.10#820010)