[
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15012417#comment-15012417
]
Tsz Wo Nicholas Sze commented on HDFS-1312:
-------------------------------------------
Hi Anu, the design doc looks good in general.
I think we don't need to define volumeDataDensity and nodeDataDensity in
Section 4.1. We may simply formulate the calculation using weighted mean and
weighted variance.
- dfsUsedRatio_i for storage i is defined the same as before, i.e.
{code}
dfsUsedRatio_i = dfsUsed_i/capacity_i.
{code}
- Define normalized weight using capacity as
{code}
w_i = capacity_i / sum(capacity_i).
{code}
- Then, define
{code}
nodeWeightedMean = sum(w_i * dfsUsedRatio_i), and
nodeWeightedVariance = sum(w_i * (ratio_i - nodeWeightedMean)^2).
{code}
We use nodeWeightedVariance (instead of nodeDataDensity) to do comparison.
Note that nodeWeightedMean is the same as idealStorage.
> Re-balance disks within a Datanode
> ----------------------------------
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: datanode
> Reporter: Travis Crawford
> Assignee: Anu Engineer
> Attachments: Architecture_and_testplan.pdf, disk-balancer-proposal.pdf
>
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations
> where certain disks are full while others are significantly less used. Users
> at many different sites have experienced this issue, and HDFS administrators
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode.
> In write-heavy environments this will still make use of all spindles,
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is
> not needed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)