[
https://issues.apache.org/jira/browse/HDFS-9502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15053633#comment-15053633
]
Tsz Wo Nicholas Sze commented on HDFS-9502:
-------------------------------------------
Thanks Anu. Copied [my earlier
comment|https://issues.apache.org/jira/browse/HDFS-1312?focusedCommentId=15012417&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15012417]
(with some bug fix) below.
We may simply formulate the calculation using weighted mean and weighted
variance.
- dfsUsedRatio_i for storage i is defined the same as before, i.e.
{code}
dfsUsedRatio_i = dfsUsed_i/capacity_i.
{code}
- Define normalized weight using capacity as
{code}
w_i = capacity_i / sum(capacity_j).
{code}
- Then, define
{code}
nodeWeightedMean = sum(w_j * dfsUsedRatio_j), and
nodeWeightedVariance = sum(w_j * (ratio_j - nodeWeightedMean)^2).
{code}
We use nodeWeightedVariance (instead of nodeDataDensity) to do comparison.
Note that nodeWeightedMean is the same as idealStorage.
- Note also that the calculation of nodeWeightedVariance can be simplified as
{code}
nodeWeightedVariance = sum(w_j * ratio_j^2) - nodeWeightedMean^2.
{code}
> DiskBalancer : Replace Node and Data Density with Weighted Mean and Variance
> ----------------------------------------------------------------------------
>
> Key: HDFS-9502
> URL: https://issues.apache.org/jira/browse/HDFS-9502
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: datanode
> Reporter: Anu Engineer
> Assignee: Anu Engineer
>
> We use notions called Data Density which are based are similar to weighted
> mean and variance. Make sure that computations map directly to these concepts
> since it is easier to understand them than the density as defined in Disk
> Balancer now.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)