[ 
https://issues.apache.org/jira/browse/HBASE-22265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16878326#comment-16878326
 ] 

David Manning commented on HBASE-22265:
---------------------------------------

This confused me too, the first time I read the code. Probably the comments 
could be improved.

Imagine 5 servers with 50 regions. The worst cost would be 50 regions on one 
server, and 0 regions on each of the remaining 4 servers. So the max cost is 
the sum of the distance from the mean for each server.

For 0, 0, 0, 0, 50 - there's a mean of 10. The first 4 servers are 10 away from 
the mean (mean - 0), and the fifth server is (total - mean) away from the mean. 
That's where the first block comes from:
{code:java}
double max = ((count - 1) * mean) + (total - mean);{code}
For the second block, imagine 3 servers with 20 regions. The min cost is to 
have all servers hosting number of regions as close to the mean as possible. 
But you can't host a fraction of a region. So there is a theoretical min bound. 
That second block is calculating the number of servers which will have to have 
mean number of regions *rounded up*, as well as mean number of regions *rounded 
down*. The best we could do would be 6, 7, 7. Or 2 * (7 - 6.66) + 1 * (6.66 - 
6). So that's the min bound on cost.

> Cost calculation in SLB may not be correct
> ------------------------------------------
>
>                 Key: HBASE-22265
>                 URL: https://issues.apache.org/jira/browse/HBASE-22265
>             Project: HBase
>          Issue Type: Brainstorming
>          Components: Balancer
>            Reporter: Biju Nair
>            Priority: Minor
>
> In 
> [CostFromArray|https://github.com/apache/hbase/blob/baf3ae80f5588ee848176adefc9f56818458a387/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java#L1039]
>  method of SLB, the calculated value of {{max}} which in turn used to scale 
> "may" not be correct.
> {noformat}
>  // Compute max as if all region servers had 0 and one had the sum of all 
> costs. This must be
> // a zero sum cost for this to make sense.
> double max = ((count - 1) * mean) + (total - mean);{noformat}
> with the  current calculation {{max}} will end up with the value close to 
> twice that of the total of all the elements passed in the array (less the 
> mean value) while the comment above the calculation seem to imply that the 
> {{max}} value to be sum of all costs i.e. the value of the variable 
> {{total}}. 
>  
> Also it would be good to document the reasoning for the following calculation 
> in the same method. I can create a patch if anyone who is familiar with this 
> code can help understand the reasoning.
> {noformat}
> min = (numHigh * (Math.ceil(mean) - mean)) + (numLow * (mean - 
> Math.floor(mean)));{noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to