Jean-Daniel Cryans created HBASE-9601:
-----------------------------------------

             Summary: Use a faster balancer when the imbalance is in the 
hundreds of regions
                 Key: HBASE-9601
                 URL: https://issues.apache.org/jira/browse/HBASE-9601
             Project: HBase
          Issue Type: Improvement
    Affects Versions: 0.96.0
            Reporter: Jean-Daniel Cryans
             Fix For: 0.98.0, 0.96.1


Something I'm noticing is that the new balancer is good at optimizing the 
balance when it needs to move a handful of regions, but once the imbalance is 
in the hundreds then it might be better if we used something speedier.

For example, I have a small 5 nodes cluster that I use to test with a lot of 
regions, 10k to be precise. The average load should be 2000, but killing one RS 
will make the average go to 2500, and when the RS comes back there's 2000 
regions to move. When I call the balancer it spends 30 seconds to be able to 
move 150-300 regions, so in order to go back to a good balancer cluster-wide it 
takes me a few runs, but if the balancer was doing it by itself (with the 5 
minutes wait), it could take hours. Maybe not a bad thing in prod although 
getting your locality back might be a good idea as well as offloading the 
machines.

So it seems that we need to be able to detect this situation and only balance 
based on load average, and maybe locality (so that the original regions would 
move back, hopefully).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to