[ 
https://issues.apache.org/jira/browse/HBASE-9601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-9601:
----------------------------------

    Fix Version/s:     (was: 0.96.1)
                       (was: 0.98.0)

No patch, undoing fix versions.

> Use a faster balancer when the imbalance is in the hundreds of regions
> ----------------------------------------------------------------------
>
>                 Key: HBASE-9601
>                 URL: https://issues.apache.org/jira/browse/HBASE-9601
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.96.0
>            Reporter: Jean-Daniel Cryans
>
> Something I'm noticing is that the new balancer is good at optimizing the 
> balance when it needs to move a handful of regions, but once the imbalance is 
> in the hundreds then it might be better if we used something speedier.
> For example, I have a small 5 nodes cluster that I use to test with a lot of 
> regions, 10k to be precise. The average load should be 2000, but killing one 
> RS will make the average go to 2500, and when the RS comes back there's 2000 
> regions to move. When I call the balancer it spends 30 seconds to be able to 
> move 150-300 regions, so in order to go back to a good balancer cluster-wide 
> it takes me a few runs, but if the balancer was doing it by itself (with the 
> 5 minutes wait), it could take hours. Maybe not a bad thing in prod although 
> getting your locality back might be a good idea as well as offloading the 
> machines.
> So it seems that we need to be able to detect this situation and only balance 
> based on load average, and maybe locality (so that the original regions would 
> move back, hopefully).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to