[ 
https://issues.apache.org/jira/browse/HBASE-14215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729693#comment-14729693
 ] 

Biju Nair commented on HBASE-14215:
-----------------------------------

Thanks [~enis] for your comments. Disabling rack awareness will enable SLB to 
come-up with a better plan even with lower 
{{hbase.master.balancer.stochastic.primaryRegionCountCost}}. Will try to do 
some tests.

Given that potential candidates are generated randomly, one would assume that 
"global optimum" will be attained with multiple candidate generations and there 
will be no "local optimum". No?

As we included a new cost function for primary replication skew, will taking 
into account of primary replicas in the candidate generator (may be in 
{{RegionReplicaCandidateGenerator}}) can help keep 
{{hbase.master.balancer.stochastic.primaryRegionCountCost}} lower?

> Default cost used for PrimaryRegionCountSkewCostFunction is not sufficient 
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-14215
>                 URL: https://issues.apache.org/jira/browse/HBASE-14215
>             Project: HBase
>          Issue Type: Bug
>          Components: Balancer
>            Reporter: Biju Nair
>            Priority: Minor
>         Attachments: 14215-v1.txt
>
>
> Current multiplier of 500 used in the stochastic balancer cost function 
> {{PrimaryRegionCountSkewCostFunction}} to calculate the cost of  total 
> primary replication skew doesn't seem to be sufficient to prevent the skews 
> (Refer HBASE-14110). We would want the default cost to be a higher value so 
> that skews in primary region replica has higher cost. The following is the 
> test result by setting the multiplier value to 10000 (same as the region 
> replica rack cost multiplier) on a 3 Rack 9 RS node cluster which seems to 
> get the balancer distribute the primaries uniformly.
> *Initial Primary replica distribution - using the current multiplier* 
>  |r1n10|  102|
>  |r1n11|  85|
>  |r1n9|    88|
>  |r2n10|  120|
>  |r2n11|  120|
>  |r2n9|   124|
>  |r3n10|  135|
>  |r3n11|  124|
>  |r3n9|    129|
> *After long duration of read & writes - using current multiplier*     
> | r1n10|  102|
> | r1n11|  85|
> | r1n9|    88|
> | r2n10|  120|
> | r2n11|  120|
> | r2n9 |   124|
> | r3n10|  135|
> | r3n11|  124|
> | r3n9|    129|
> *After manual balancing*      
> | r1n10|  102|
> | r1n11|  85|
> | r1n9|    88|
> | r2n10|  120|
> | r2n11|  120|
> | r2n9 |   124|
> | r3n10|  135|
> | r3n11|  124|
> | r3n9|    129|
> *Increased multiplier for primaryRegionCountSkewCost to 10000*        
> | r1n10|  114|
> | r1n11 | 113|
> | r1n9 |   114|
> | r2n10|  114|
> | r2n11|  114|
> | r2n9 |   113|
> | r3n10|  115|
> | r3n11|  115|
> | r3n9 |   115 |
> Setting the {{PrimaryRegionCountSkewCostFunction}} multiplier value to 10000 
> should help HBase general use.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to