Charlie Qiangeng Xu created HBASE-17039:
-------------------------------------------
Summary: SimpleLoadBalancer schedules large amount of invalid
region moves
Key: HBASE-17039
URL: https://issues.apache.org/jira/browse/HBASE-17039
Project: HBase
Issue Type: Bug
Components: Balancer
Affects Versions: 1.2.3, 1.1.6, 2.0.0
Reporter: Charlie Qiangeng Xu
Assignee: Charlie Qiangeng Xu
Fix For: 2.0.0, 1.2.3, 1.1.6
After increasing one of our clusters to 1600 nodes, we observed a large amount
of invalid region moves(more than 30000 thousand moves) fired by balance chore.
Thus we simulated the problem and printed out the balance plan, only to find
out many server that had two regions for a certain table(we use by table
strategy), sent out both regions to other two servers that have zero regions.
In the SimpleLoadBalancer's balanceCluster function,
the code block that determines the underLoadedServers might have a problem:
if (load >= min && load > 0) {
continue; // look for other servers which haven't reached min
}
int regionsToPut = min - load;
if (regionsToPut == 0)
{
regionsToPut = 1;
}
if min is zero, some server that has load of zero, which equals to min would be
marked as underloaded, which would cause such problem mentioned above.
Since we increase the cluster's size to 1600+, many table only have 1000
regions, now would encounter such issue.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)