[ 
https://issues.apache.org/jira/browse/HBASE-17039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charlie Qiangeng Xu updated HBASE-17039:
----------------------------------------
    Description: 
After increasing one of our clusters to 1600 nodes, we observed a large amount 
of invalid region moves(more than 30k moves) fired by the balance chore. Thus 
we simulated the problem and printed out the balance plan, only to find out 
many servers that had two regions for a certain table(we use by table 
strategy), sent out both regions to other two servers that have zero region. 
In the SimpleLoadBalancer's balanceCluster function,
the code block that determines the underLoadedServers might have a problem:
      if (load >= min && load > 0) {
        continue; // look for other servers which haven't reached min
      }
      int regionsToPut = min - load;
      if (regionsToPut == 0)
      {
        regionsToPut = 1;
      }
if min is zero, some server that has load of zero, which equals to min would be 
marked as underloaded, which would cause the phenomenon mentioned above.
Since we increased the cluster's size to 1600+, many tables that only have 1000 
regions, now would encounter such issue.



  was:
After increasing one of our clusters to 1600 nodes, we observed a large amount 
of invalid region moves(more than 30k moves) fired by balance chore. Thus we 
simulated the problem and printed out the balance plan, only to find out many 
server that had two regions for a certain table(we use by table strategy), sent 
out both regions to other two servers that have zero regions. 
In the SimpleLoadBalancer's balanceCluster function,
the code block that determines the underLoadedServers might have a problem:
      if (load >= min && load > 0) {
        continue; // look for other servers which haven't reached min
      }
      int regionsToPut = min - load;
      if (regionsToPut == 0)
      {
        regionsToPut = 1;
      }
if min is zero, some server that has load of zero, which equals to min would be 
marked as underloaded, which would cause such problem mentioned above.
Since we increase the cluster's size to 1600+, many table only have 1000 
regions, now would encounter such issue.




> SimpleLoadBalancer schedules large amount of invalid region moves
> -----------------------------------------------------------------
>
>                 Key: HBASE-17039
>                 URL: https://issues.apache.org/jira/browse/HBASE-17039
>             Project: HBase
>          Issue Type: Bug
>          Components: Balancer
>    Affects Versions: 2.0.0, 1.1.6, 1.2.3
>            Reporter: Charlie Qiangeng Xu
>            Assignee: Charlie Qiangeng Xu
>             Fix For: 2.0.0, 1.1.6, 1.2.3
>
>
> After increasing one of our clusters to 1600 nodes, we observed a large 
> amount of invalid region moves(more than 30k moves) fired by the balance 
> chore. Thus we simulated the problem and printed out the balance plan, only 
> to find out many servers that had two regions for a certain table(we use by 
> table strategy), sent out both regions to other two servers that have zero 
> region. 
> In the SimpleLoadBalancer's balanceCluster function,
> the code block that determines the underLoadedServers might have a problem:
>       if (load >= min && load > 0) {
>         continue; // look for other servers which haven't reached min
>       }
>       int regionsToPut = min - load;
>       if (regionsToPut == 0)
>       {
>         regionsToPut = 1;
>       }
> if min is zero, some server that has load of zero, which equals to min would 
> be marked as underloaded, which would cause the phenomenon mentioned above.
> Since we increased the cluster's size to 1600+, many tables that only have 
> 1000 regions, now would encounter such issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to