[ 
https://issues.apache.org/jira/browse/HBASE-17110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Li updated HBASE-17110:
--------------------------
     Hadoop Flags: Reviewed
    Fix Version/s: 2.0.0
     Release Note: After HBASE-17110 the bytable strategy for 
SimpleLoadBalancer will also take server level balance into account
      Description: 
Currently with bytable strategy there might still be server-level imbalance and 
we will improve this in this JIRA.

Some more background:
When operating large scale clusters(our case), some companies still prefer to 
use {{SimpleLoadBalancer}} due to its simplicity, quick balance plan 
generation, etc. Current SimpleLoadBalancer has two modes: 
1. byTable, which only guarantees that the regions of one table could be 
uniformly distributed. 
2. byCluster, which ignores the distribution within tables and balance the 
regions all together.
If the pressures on different tables are different, the first byTable option is 
the preferable one in most case. Yet, this choice sacrifice the cluster level 
balance and would cause some servers to have significantly higher load, e.g. 
242 regions on server A but 417 regions on server B.(real world stats)
Consider this case,  a cluster has 3 tables and 4 servers:
{noformat}
  server A has 3 regions: table1:1, table2:1, table3:1
  server B has 3 regions: table1:2, table2:2, table3:2
  server C has 3 regions: table1:3, table2:3, table3:3
  server D has 0 regions.
{noformat}
>From the byTable strategy's perspective, the cluster has already been 
>perfectly balanced on table level. But a perfect status should be like:
{noformat}
  server A has 2 regions: table2:1, table3:1
  server B has 2 regions: table1:2, table3:2
  server C has 3 regions: table1:3, table2:3, table3:3
  server D has 2 regions: table1:1, table2:2
{noformat}
We can see the server loads change from 3,3,3,0 to 2,2,3,2, while the table1, 
table2 and table3 still keep balanced. And this is the goal this JIRA tries to 
achieve.

Two UTs will be added as well with the last one demonstrating advantage of the 
new strategy. Also, a onConfigurationChange method will be implemented to hot 
control the "slop" variable.

We have been using the strategy on our largest cluster for several months, so 
the effect could be assured to some extent.



 

  was:
This jira is about an enhancement of simpleLoadBalancer. Here we introduce a 
new strategy: "bytableOverall" which could be controlled by adding:
{noformat}
<property>
  <name>hbase.master.loadbalance.bytableOverall</name>
  <value>true</value>
</property>
{noformat}
We have been using the strategy on our largest cluster for several months. it's 
proven to be very helpful and stable, especially, the result is quite visible 
to the users.

Here is the reason why it's helpful:
When operating large scale clusters(our case), some companies still prefer to 
use {{SimpleLoadBalancer}} due to its simplicity, quick balance plan 
generation, etc. Current SimpleLoadBalancer has two modes: 
1. byTable, which only guarantees that the regions of one table could be 
uniformly distributed. 
2. byCluster, which ignores the distribution within tables and balance the 
regions all together.
If the pressures on different tables are different, the first byTable option is 
the preferable one in most case. Yet, this choice sacrifice the cluster level 
balance and would cause some servers to have significantly higher load, e.g. 
242 regions on server A but 417 regions on server B.(real world stats)
Consider this case,  a cluster has 3 tables and 4 servers:
{noformat}
  server A has 3 regions: table1:1, table2:1, table3:1
  server B has 3 regions: table1:2, table2:2, table3:2
  server C has 3 regions: table1:3, table2:3, table3:3
  server D has 0 regions.
{noformat}
>From the byTable strategy's perspective, the cluster has already been 
>perfectly balanced on table level. But a perfect status should be like:
{noformat}
  server A has 2 regions: table2:1, table3:1
  server B has 2 regions: table1:2, table3:2
  server C has 3 regions: table1:3, table2:3, table3:3
  server D has 2 regions: table1:1, table2:2
{noformat}
We can see the server loads change from 3,3,3,0 to 2,2,3,2, while the table1, 
table2 and table3 still keep balanced.   
And this is what the new mode "byTableOverall" can achieve.

Two UTs have been added as well and the last one demonstrates the advantage of 
the new strategy.
Also, a onConfigurationChange method has been implemented to hot control the 
"slop" variable.



 

          Summary: Improve SimpleLoadBalancer to always take server-level 
balance into account  (was: Improve SimpleLoadBalancer to consider server level 
balance)

Update the description and add release note

> Improve SimpleLoadBalancer to always take server-level balance into account
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-17110
>                 URL: https://issues.apache.org/jira/browse/HBASE-17110
>             Project: HBase
>          Issue Type: Improvement
>          Components: Balancer
>    Affects Versions: 2.0.0, 1.2.4
>            Reporter: Charlie Qiangeng Xu
>            Assignee: Charlie Qiangeng Xu
>             Fix For: 2.0.0
>
>         Attachments: HBASE-17110-V2.patch, HBASE-17110-V3.patch, 
> HBASE-17110-V4.patch, HBASE-17110-V5.patch, HBASE-17110-V6.patch, 
> HBASE-17110-V7.patch, HBASE-17110-V8.patch, HBASE-17110.patch
>
>
> Currently with bytable strategy there might still be server-level imbalance 
> and we will improve this in this JIRA.
> Some more background:
> When operating large scale clusters(our case), some companies still prefer to 
> use {{SimpleLoadBalancer}} due to its simplicity, quick balance plan 
> generation, etc. Current SimpleLoadBalancer has two modes: 
> 1. byTable, which only guarantees that the regions of one table could be 
> uniformly distributed. 
> 2. byCluster, which ignores the distribution within tables and balance the 
> regions all together.
> If the pressures on different tables are different, the first byTable option 
> is the preferable one in most case. Yet, this choice sacrifice the cluster 
> level balance and would cause some servers to have significantly higher load, 
> e.g. 242 regions on server A but 417 regions on server B.(real world stats)
> Consider this case,  a cluster has 3 tables and 4 servers:
> {noformat}
>   server A has 3 regions: table1:1, table2:1, table3:1
>   server B has 3 regions: table1:2, table2:2, table3:2
>   server C has 3 regions: table1:3, table2:3, table3:3
>   server D has 0 regions.
> {noformat}
> From the byTable strategy's perspective, the cluster has already been 
> perfectly balanced on table level. But a perfect status should be like:
> {noformat}
>   server A has 2 regions: table2:1, table3:1
>   server B has 2 regions: table1:2, table3:2
>   server C has 3 regions: table1:3, table2:3, table3:3
>   server D has 2 regions: table1:1, table2:2
> {noformat}
> We can see the server loads change from 3,3,3,0 to 2,2,3,2, while the table1, 
> table2 and table3 still keep balanced. And this is the goal this JIRA tries 
> to achieve.
> Two UTs will be added as well with the last one demonstrating advantage of 
> the new strategy. Also, a onConfigurationChange method will be implemented to 
> hot control the "slop" variable.
> We have been using the strategy on our largest cluster for several months, so 
> the effect could be assured to some extent.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to