[ 
https://issues.apache.org/jira/browse/HBASE-17039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15934170#comment-15934170
 ] 

Hudson commented on HBASE-17039:
--------------------------------

SUCCESS: Integrated in Jenkins build HBase-1.3-IT #11 (See 
[https://builds.apache.org/job/HBase-1.3-IT/11/])
HBASE-17059 backport HBASE-17039 (SimpleLoadBalancer schedules large (liyu: rev 
446a21fedd1282c15939eb4c46d13c859beedd7a)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/SimpleLoadBalancer.java


> SimpleLoadBalancer schedules large amount of invalid region moves
> -----------------------------------------------------------------
>
>                 Key: HBASE-17039
>                 URL: https://issues.apache.org/jira/browse/HBASE-17039
>             Project: HBase
>          Issue Type: Bug
>          Components: Balancer
>    Affects Versions: 2.0.0, 1.3.0, 1.1.7, 1.2.4
>            Reporter: Charlie Qiangeng Xu
>            Assignee: Charlie Qiangeng Xu
>             Fix For: 2.0.0, 1.4.0, 1.2.5, 1.1.8
>
>         Attachments: HBASE-17039.patch
>
>
> After increasing one of our clusters to 1600 nodes, we observed a large 
> amount of invalid region moves(more than 30k moves) fired by the balance 
> chore. Thus we simulated the problem and printed out the balance plan, only 
> to find out many servers that had two regions for a certain table(we use by 
> table strategy), sent out both regions to other two servers that have zero 
> region. 
> In the SimpleLoadBalancer's balanceCluster function,
> the code block that determines the underLoadedServers might have a problem:
> {code}
>       if (load >= min && load > 0) {
>         continue; // look for other servers which haven't reached min
>       }
>       int regionsToPut = min - load;
>       if (regionsToPut == 0)
>       {
>         regionsToPut = 1;
>       }
> {code}
> if min is zero, some server that has load of zero, which equals to min would 
> be marked as underloaded, which would cause the phenomenon mentioned above.
> Since we increased the cluster's size to 1600+, many tables that only have 
> 1000 regions, now would encounter such issue.
> By fixing it up, the balance plan went back to normal.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to