[
https://issues.apache.org/jira/browse/HBASE-14215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696795#comment-14696795
]
Enis Soztutar commented on HBASE-14215:
---------------------------------------
Agreed that 10K for cost is too much. We have
{{hbase.master.balancer.stochastic.regionReplicaRackCostKey=10000}} and
{{hbase.master.balancer.stochastic.regionReplicaHostCostKey=100000}}. These are
excplicitly set high so that, these costs dominate and have the affect of doing
a soft-constraint on replica anti-colocation. Setting
{{PrimaryRegionCountSkewCostFunction=10000}} probably causes host based replica
placement to work correctly, but rack based anti-colocation is given up.
Biju, did you try with smaller costs? You should be able to set it via conf
without changing the code.
> Default cost used for PrimaryRegionCountSkewCostFunction is not sufficient
> ---------------------------------------------------------------------------
>
> Key: HBASE-14215
> URL: https://issues.apache.org/jira/browse/HBASE-14215
> Project: HBase
> Issue Type: Bug
> Components: Balancer
> Reporter: Biju Nair
> Priority: Minor
> Attachments: 14215-v1.txt
>
>
> Current multiplier of 500 used in the stochastic balancer cost function
> ``PrimaryRegionCountSkewCostFunction`` to calculate the cost of total
> primary replication skew doesn't seem to be sufficient to prevent the skews
> (Refer HBASE-14110). We would want the default cost to be a higher value so
> that skews in primary region replica has higher cost. The following is the
> test result by setting the multiplier value to 10000 (same as the region
> replica rack cost multiplier) on a 3 Rack 9 RS node cluster which seems to
> get the balancer distribute the primaries uniformly.
> *Initial Primary replica distribution - using the current multiplier*
> r1n10 102
> r1n11 85
> r1n9 88
> r2n10 120
> r2n11 120
> r2n9 124
> r3n10 135
> r3n11 124
> r3n9 129
> *After long duration of read & writes - using current multiplier*
> r1n10 102
> r1n11 85
> r1n9 88
> r2n10 120
> r2n11 120
> r2n9 124
> r3n10 135
> r3n11 124
> r3n9 129
> *After manual balancing*
> r1n10 102
> r1n11 85
> r1n9 88
> r2n10 120
> r2n11 120
> r2n9 124
> r3n10 135
> r3n11 124
> r3n9 129
> *Increased multiplier for primaryRegionCountSkewCost to 10000*
> r1n10 114
> r1n11 113
> r1n9 114
> r2n10 114
> r2n11 114
> r2n9 113
> r3n10 115
> r3n11 115
> r3n9 115
> Setting the `PrimaryRegionCountSkewCostFunction` multiplier value to 10000
> should help HBase general use.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)