[ 
https://issues.apache.org/jira/browse/HDFS-9502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15053633#comment-15053633
 ] 

Tsz Wo Nicholas Sze commented on HDFS-9502:
-------------------------------------------

Thanks Anu.  Copied [my earlier 
comment|https://issues.apache.org/jira/browse/HDFS-1312?focusedCommentId=15012417&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15012417]
 (with some bug fix) below.

We may simply formulate the calculation using weighted mean and weighted 
variance.
- dfsUsedRatio_i for storage i is defined the same as before, i.e.
{code}
dfsUsedRatio_i = dfsUsed_i/capacity_i.
{code}
- Define normalized weight using capacity as 
{code}
w_i = capacity_i / sum(capacity_j).
{code}
- Then, define
{code}
    nodeWeightedMean = sum(w_j * dfsUsedRatio_j), and
nodeWeightedVariance = sum(w_j * (ratio_j - nodeWeightedMean)^2).
{code}
We use nodeWeightedVariance (instead of nodeDataDensity) to do comparison.  
Note that nodeWeightedMean is the same as idealStorage.

- Note also that the calculation of nodeWeightedVariance can be simplified as 
{code}
nodeWeightedVariance = sum(w_j * ratio_j^2) - nodeWeightedMean^2.
{code}


> DiskBalancer : Replace Node and Data Density with Weighted Mean and Variance
> ----------------------------------------------------------------------------
>
>                 Key: HDFS-9502
>                 URL: https://issues.apache.org/jira/browse/HDFS-9502
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode
>            Reporter: Anu Engineer
>            Assignee: Anu Engineer
>
> We use notions called Data Density which are based are similar to weighted 
> mean and variance. Make sure that computations map directly to these concepts 
> since it is easier to understand them than the density as defined in Disk 
> Balancer now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to