[
https://issues.apache.org/jira/browse/HBASE-17039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15644764#comment-15644764
]
Charlie Qiangeng Xu edited comment on HBASE-17039 at 11/7/16 5:21 PM:
----------------------------------------------------------------------
Just skimmed through the historical changes for this part,
I found the code causing problem right now could be attributed to HBASE-7060.
The issue described in that Jira has been handled nicely by other part of
current simpleLoadBalancer logic,
thus the code block aforementioned is not necessary, yet problematic.
[[email protected]], it seems you were involved in that JIRA, any interest
to take a look at this one?
was (Author: xharlie):
Just skimmed through the historical changes for this part,
I found the code causing problem right now could be attributed to HBASE-7060.
The problem mentioned in that Jira has been handled nicely by other part of
current balancer logic,
yet the code block aforementioned would only cause problem right now.
[[email protected]], it seems you were involved in that JIRA, any interest
to take a look at this one?
> SimpleLoadBalancer schedules large amount of invalid region moves
> -----------------------------------------------------------------
>
> Key: HBASE-17039
> URL: https://issues.apache.org/jira/browse/HBASE-17039
> Project: HBase
> Issue Type: Bug
> Components: Balancer
> Affects Versions: 2.0.0, 1.2.3, 1.1.7
> Reporter: Charlie Qiangeng Xu
> Assignee: Charlie Qiangeng Xu
> Attachments: HBASE-17039.patch
>
>
> After increasing one of our clusters to 1600 nodes, we observed a large
> amount of invalid region moves(more than 30k moves) fired by the balance
> chore. Thus we simulated the problem and printed out the balance plan, only
> to find out many servers that had two regions for a certain table(we use by
> table strategy), sent out both regions to other two servers that have zero
> region.
> In the SimpleLoadBalancer's balanceCluster function,
> the code block that determines the underLoadedServers might have a problem:
> {code}
> if (load >= min && load > 0) {
> continue; // look for other servers which haven't reached min
> }
> int regionsToPut = min - load;
> if (regionsToPut == 0)
> {
> regionsToPut = 1;
> }
> {code}
> if min is zero, some server that has load of zero, which equals to min would
> be marked as underloaded, which would cause the phenomenon mentioned above.
> Since we increased the cluster's size to 1600+, many tables that only have
> 1000 regions, now would encounter such issue.
> By fixing it up, the balance plan went back to normal.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)