[ 
https://issues.apache.org/jira/browse/HBASE-17110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15670683#comment-15670683
 ] 

Yu Li commented on HBASE-17110:
-------------------------------

Thanks for chiming in [~anoop.hbase].

The reason we make it configurable is that the "more perfect cluster level 
balance" is with more precise criterion, and if we turn it on, possibly it'll 
move regions more aggressively (but it's true that we could use 
{{hbase.regions.slop}} to better control it). Since big batch of region moving 
will cause spike and is bad for online performance/stability, we choose to set 
the config off by default, meanwhile supplied a shell command to trigger the 
cluster level balance manually (and please open another JIRA to upstream the 
hbase shell tool fella [~xharlie]). And this "possibly auto big batch of region 
moving" is also the reason we're not using {{StochasticLoadBalancer}} online, 
FWIW.

For master branch, I agree that we could be more aggressive and make the option 
ON by default, so people will know about this option, but better to make a 
clear release note about the change.

Let me know your thoughts, thanks.

> Add an "Overall Strategy" option(balanced both on table level and server 
> level) to SimpleLoadBalancer
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-17110
>                 URL: https://issues.apache.org/jira/browse/HBASE-17110
>             Project: HBase
>          Issue Type: New Feature
>          Components: Balancer
>    Affects Versions: 2.0.0, 1.2.4
>            Reporter: Charlie Qiangeng Xu
>            Assignee: Charlie Qiangeng Xu
>         Attachments: HBASE-17110.patch, SimpleBalancerBytableOverall.V1
>
>
> This jira is about an enhancement of simpleLoadBalancer. Here we introduce a 
> new strategy: "bytableOverall" which could be controlled by adding:
> {noformat}
> <property>
>   <name>hbase.master.loadbalance.bytableOverall</name>
>   <value>true</value>
> </property>
> {noformat}
> We have been using the strategy on our largest cluster for several months. 
> it's proven to be very helpful and stable, especially, the result is quite 
> visible to the users.
> Here is the reason why it's helpful:
> When operating large scale clusters(our case), some companies still prefer to 
> use {{SimpleLoadBalancer}} due to its simplicity, quick balance plan 
> generation, etc. Current SimpleLoadBalancer has two modes: 
> 1. byTable, which only guarantees that the regions of one table could be 
> uniformly distributed. 
> 2. byCluster, which ignores the distribution within tables and balance the 
> regions all together.
> If the pressures on different tables are different, the first byTable option 
> is the preferable one in most case. Yet, this choice sacrifice the cluster 
> level balance and would cause some servers to have significantly higher load, 
> e.g. 242 regions on server A but 417 regions on server B.(real world stats)
> Consider this case,  a cluster has 3 tables and 4 servers:
> {noformat}
>   server A has 3 regions: table1:1, table2:1, table3:1
>   server B has 3 regions: table1:2, table2:2, table3:2
>   server C has 3 regions: table1:3, table2:3, table3:3
>   server D has 0 regions.
> {noformat}
> From the byTable strategy's perspective, the cluster has already been 
> perfectly balanced on table level. But a perfect status should be like:
> {noformat}
>   server A has 2 regions: table2:1, table3:1
>   server B has 2 regions: table1:2, table3:2
>   server C has 3 regions: table1:3, table2:3, table3:3
>   server D has 2 regions: table1:1, table2:2
> {noformat}
> We can see the server loads change from 3,3,3,0 to 2,2,3,2, while the table1, 
> table2 and table3 still keep balanced.   
> And this is what the new mode "byTableOverall" can achieve.
> Two UTs have been added as well and the last one demonstrates the advantage 
> of the new strategy.
> Also, a onConfigurationChange method has been implemented to hot control the 
> "slop" variable.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to