[ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063652#comment-16063652
 ] 

Hadoop QA commented on HBASE-18164:
-----------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 21m 58s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 2s 
{color} | {color:blue} The patch file was not named according to hbase's naming 
conventions. Please see 
https://yetus.apache.org/documentation/0.3.0/precommit-patchnames for 
instructions. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 18s {color} 
| {color:red} HBASE-18164 does not apply to branch-1. Rebase required? Wrong 
Branch? See https://yetus.apache.org/documentation/0.3.0/precommit-patchnames 
for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.13.1 Server=1.13.1 Image:yetus/hbase:395d9a0 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12874541/18164.branch-1.addendum.txt
 |
| JIRA Issue | HBASE-18164 |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7338/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Much faster locality cost function and candidate generator
> ----------------------------------------------------------
>
>                 Key: HBASE-18164
>                 URL: https://issues.apache.org/jira/browse/HBASE-18164
>             Project: HBase
>          Issue Type: Improvement
>          Components: Balancer
>            Reporter: Kahlil Oppenheimer
>            Assignee: Kahlil Oppenheimer
>            Priority: Critical
>             Fix For: 3.0.0, 1.4.0, 2.0.0-alpha-2
>
>         Attachments: 18164.branch-1.addendum.txt, HBASE-18164-00.patch, 
> HBASE-18164-01.patch, HBASE-18164-02.patch, HBASE-18164-04.patch, 
> HBASE-18164-05.patch, HBASE-18164-06.patch, HBASE-18164-07.patch, 
> HBASE-18164-08.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to