[
https://issues.apache.org/jira/browse/HBASE-14215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14703160#comment-14703160
]
Biju Nair commented on HBASE-14215:
-----------------------------------
Thanks all for the comments. The following is the reasoning for setting the
value to 10000 for {{hbase.master.balancer.stochastic.primaryRegionCountCost}}
in the context of region replication > 1 based on my limited understanding.
- High value of 100000 set to {{regionReplicaHostCostKey}} helps with
reducing/eliminating duplicate replication of regions on hosts and in turn
improving availability. Elimination of duplicate regions on hosts also helps
with performance with secondary calls being made to different hosts and hence
distributing the query load.
- The function to reduce duplicates of region replicas on the same rack which
uses the multiplier {{regionReplicaRackCostKey}} helps with availability but
not as much with the performance of queries since they get distributed to the
servers with no consideration to rack.
- The new function to reduce skews of primary region replicas on servers is to
distribute the primaries uniformly across all the servers which intern
distributes query load and improves performance since by default all queries
will get serviced by primary replicas.
While duplicate replicas on servers are eliminated by high cost of 100000 which
also helps with performance, the next criteria was to balance between rack
level availability vs request performance. By setting
{{primaryRegionCountCost}} equal to {{regionReplicaRackCostKey}} which is
10000 the assumption was that the candidate cluster which will be used will be
balanced for availability and performance. Let me know what was overlooked so
it will help with the understanding.
As suggested will try other cost values and update the ticket. Currently we
are using site.xml to vary the costs.
> Default cost used for PrimaryRegionCountSkewCostFunction is not sufficient
> ---------------------------------------------------------------------------
>
> Key: HBASE-14215
> URL: https://issues.apache.org/jira/browse/HBASE-14215
> Project: HBase
> Issue Type: Bug
> Components: Balancer
> Reporter: Biju Nair
> Priority: Minor
> Attachments: 14215-v1.txt
>
>
> Current multiplier of 500 used in the stochastic balancer cost function
> {{PrimaryRegionCountSkewCostFunction}} to calculate the cost of total
> primary replication skew doesn't seem to be sufficient to prevent the skews
> (Refer HBASE-14110). We would want the default cost to be a higher value so
> that skews in primary region replica has higher cost. The following is the
> test result by setting the multiplier value to 10000 (same as the region
> replica rack cost multiplier) on a 3 Rack 9 RS node cluster which seems to
> get the balancer distribute the primaries uniformly.
> *Initial Primary replica distribution - using the current multiplier*
> r1n10 102
> r1n11 85
> r1n9 88
> r2n10 120
> r2n11 120
> r2n9 124
> r3n10 135
> r3n11 124
> r3n9 129
> *After long duration of read & writes - using current multiplier*
> r1n10 102
> r1n11 85
> r1n9 88
> r2n10 120
> r2n11 120
> r2n9 124
> r3n10 135
> r3n11 124
> r3n9 129
> *After manual balancing*
> r1n10 102
> r1n11 85
> r1n9 88
> r2n10 120
> r2n11 120
> r2n9 124
> r3n10 135
> r3n11 124
> r3n9 129
> *Increased multiplier for primaryRegionCountSkewCost to 10000*
> r1n10 114
> r1n11 113
> r1n9 114
> r2n10 114
> r2n11 114
> r2n9 113
> r3n10 115
> r3n11 115
> r3n9 115
> Setting the {{PrimaryRegionCountSkewCostFunction}} multiplier value to 10000
> should help HBase general use.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)