[
https://issues.apache.org/jira/browse/HBASE-22265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16878326#comment-16878326
]
David Manning commented on HBASE-22265:
---------------------------------------
This confused me too, the first time I read the code. Probably the comments
could be improved.
Imagine 5 servers with 50 regions. The worst cost would be 50 regions on one
server, and 0 regions on each of the remaining 4 servers. So the max cost is
the sum of the distance from the mean for each server.
For 0, 0, 0, 0, 50 - there's a mean of 10. The first 4 servers are 10 away from
the mean (mean - 0), and the fifth server is (total - mean) away from the mean.
That's where the first block comes from:
{code:java}
double max = ((count - 1) * mean) + (total - mean);{code}
For the second block, imagine 3 servers with 20 regions. The min cost is to
have all servers hosting number of regions as close to the mean as possible.
But you can't host a fraction of a region. So there is a theoretical min bound.
That second block is calculating the number of servers which will have to have
mean number of regions *rounded up*, as well as mean number of regions *rounded
down*. The best we could do would be 6, 7, 7. Or 2 * (7 - 6.66) + 1 * (6.66 -
6). So that's the min bound on cost.
> Cost calculation in SLB may not be correct
> ------------------------------------------
>
> Key: HBASE-22265
> URL: https://issues.apache.org/jira/browse/HBASE-22265
> Project: HBase
> Issue Type: Brainstorming
> Components: Balancer
> Reporter: Biju Nair
> Priority: Minor
>
> In
> [CostFromArray|https://github.com/apache/hbase/blob/baf3ae80f5588ee848176adefc9f56818458a387/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java#L1039]
> method of SLB, the calculated value of {{max}} which in turn used to scale
> "may" not be correct.
> {noformat}
> // Compute max as if all region servers had 0 and one had the sum of all
> costs. This must be
> // a zero sum cost for this to make sense.
> double max = ((count - 1) * mean) + (total - mean);{noformat}
> with the current calculation {{max}} will end up with the value close to
> twice that of the total of all the elements passed in the array (less the
> mean value) while the comment above the calculation seem to imply that the
> {{max}} value to be sum of all costs i.e. the value of the variable
> {{total}}.
>
> Also it would be good to document the reasoning for the following calculation
> in the same method. I can create a patch if anyone who is familiar with this
> code can help understand the reasoning.
> {noformat}
> min = (numHigh * (Math.ceil(mean) - mean)) + (numLow * (mean -
> Math.floor(mean)));{noformat}
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)