[
https://issues.apache.org/jira/browse/HBASE-14215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14730875#comment-14730875
]
Biju Nair commented on HBASE-14215:
-----------------------------------
Primary replica distribution with
{{hbase.master.balancer.stochastic.regionReplicaRackCostKey=0}} and
{{hbase.master.balancer.stochastic.primaryRegionCountCost=500}}
**Randomly assigned the primaries with balancer off**
|r3n9 |112|
|r1n10 |108|
|r2n9 |116|
|r2n10 |134|
|r1n11 |119|
|r2n11 |109|
|r3n10 |119|
|r3n11 |115|
|r1n9 |95|
**Primary distribution after balancer run**
|r3n9 |114|
|r1n10 |115|
|r2n9 |114|
|r2n10 |114|
|r1n11 |114|
|r2n11 |114|
|r3n10 |114|
|r1n9 |114|
|r3n11 |114|
As expected the primary replicas seem to get uniformly distributed even with a
low cost multiplier for
{{hbase.master.balancer.stochastic.primaryRegionCountCost}} when rack awareness
is disabled.
> Default cost used for PrimaryRegionCountSkewCostFunction is not sufficient
> ---------------------------------------------------------------------------
>
> Key: HBASE-14215
> URL: https://issues.apache.org/jira/browse/HBASE-14215
> Project: HBase
> Issue Type: Bug
> Components: Balancer
> Reporter: Biju Nair
> Priority: Minor
> Attachments: 14215-v1.txt
>
>
> Current multiplier of 500 used in the stochastic balancer cost function
> {{PrimaryRegionCountSkewCostFunction}} to calculate the cost of total
> primary replication skew doesn't seem to be sufficient to prevent the skews
> (Refer HBASE-14110). We would want the default cost to be a higher value so
> that skews in primary region replica has higher cost. The following is the
> test result by setting the multiplier value to 10000 (same as the region
> replica rack cost multiplier) on a 3 Rack 9 RS node cluster which seems to
> get the balancer distribute the primaries uniformly.
> *Initial Primary replica distribution - using the current multiplier*
> |r1n10| 102|
> |r1n11| 85|
> |r1n9| 88|
> |r2n10| 120|
> |r2n11| 120|
> |r2n9| 124|
> |r3n10| 135|
> |r3n11| 124|
> |r3n9| 129|
> *After long duration of read & writes - using current multiplier*
> | r1n10| 102|
> | r1n11| 85|
> | r1n9| 88|
> | r2n10| 120|
> | r2n11| 120|
> | r2n9 | 124|
> | r3n10| 135|
> | r3n11| 124|
> | r3n9| 129|
> *After manual balancing*
> | r1n10| 102|
> | r1n11| 85|
> | r1n9| 88|
> | r2n10| 120|
> | r2n11| 120|
> | r2n9 | 124|
> | r3n10| 135|
> | r3n11| 124|
> | r3n9| 129|
> *Increased multiplier for primaryRegionCountSkewCost to 10000*
> | r1n10| 114|
> | r1n11 | 113|
> | r1n9 | 114|
> | r2n10| 114|
> | r2n11| 114|
> | r2n9 | 113|
> | r3n10| 115|
> | r3n11| 115|
> | r3n9 | 115 |
> Setting the {{PrimaryRegionCountSkewCostFunction}} multiplier value to 10000
> should help HBase general use.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)