[
https://issues.apache.org/jira/browse/HBASE-9601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Purtell updated HBASE-9601:
----------------------------------
Fix Version/s: (was: 0.96.1)
(was: 0.98.0)
No patch, undoing fix versions.
> Use a faster balancer when the imbalance is in the hundreds of regions
> ----------------------------------------------------------------------
>
> Key: HBASE-9601
> URL: https://issues.apache.org/jira/browse/HBASE-9601
> Project: HBase
> Issue Type: Improvement
> Affects Versions: 0.96.0
> Reporter: Jean-Daniel Cryans
>
> Something I'm noticing is that the new balancer is good at optimizing the
> balance when it needs to move a handful of regions, but once the imbalance is
> in the hundreds then it might be better if we used something speedier.
> For example, I have a small 5 nodes cluster that I use to test with a lot of
> regions, 10k to be precise. The average load should be 2000, but killing one
> RS will make the average go to 2500, and when the RS comes back there's 2000
> regions to move. When I call the balancer it spends 30 seconds to be able to
> move 150-300 regions, so in order to go back to a good balancer cluster-wide
> it takes me a few runs, but if the balancer was doing it by itself (with the
> 5 minutes wait), it could take hours. Maybe not a bad thing in prod although
> getting your locality back might be a good idea as well as offloading the
> machines.
> So it seems that we need to be able to detect this situation and only balance
> based on load average, and maybe locality (so that the original regions would
> move back, hopefully).
--
This message was sent by Atlassian JIRA
(v6.1#6144)