[
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kahlil Oppenheimer updated HBASE-18164:
---------------------------------------
Attachment: HBASE-18164-04.patch
> Much faster locality cost function and candidate generator
> ----------------------------------------------------------
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
> Issue Type: Improvement
> Components: Balancer
> Reporter: Kahlil Oppenheimer
> Assignee: Kahlil Oppenheimer
> Priority: Critical
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch,
> HBASE-18164-02.patch, HBASE-18164-04.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12
> region servers, ~5k regions), the balancer considers ~100,000 cluster
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as
> quickly for things like table skew, region load, etc. because the balancer
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it
> only recomputes cost based on the most recent region move proposed by the
> balancer, rather than recomputing the cost across all regions/servers every
> iteration.
> Further, we also cache the locality of every region on every server at the
> beginning of the balancer's execution for both the LocalityBasedCostFunction
> and the LocalityCandidateGenerator to reference. This way, they need not
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4
> QA clusters without issue. The speed improvements we noticed are massive. Our
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference
> between the best locality that is possible given the current cluster state,
> and the currently measured locality. The old locality computation would
> measure the locality cost as the difference from the current locality and
> 100% locality, but this new computation instead takes the difference between
> the current locality for a given region and the best locality for that region
> in the cluster.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)