[
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14944247#comment-14944247
]
Anu Engineer commented on HDFS-1312:
------------------------------------
[~templedf] Thanks for your comments, Please see my thoughts on your comments.
bq. Given HDFS-1804, I think [~steve_l]'s original proposal of a balancer
script that can be run manually while the DN is offline sounds like a
simpler/safer/better approach. Since the primary remaining source of imbalance
is disk failure, an offline process seems sensible. What's the main motivation
for building an online-balancer?
# We supported hot-swapping in HDFS-1362, so this feature compliments that.
Off-line was a good idea when [~steve_l] proposed it , but I am not sure if it
helps currently.
# There are a large number of clusters which are using round-robin scheduling.
I have been looking around for the data on HDFS-1804 (In fact the proposal
discusses that issue). It will be good if you have some data on HDFS-1804
deployment. Most of the clusters that I am (anecdotal, I know) seeing are
based on round robin scheduling. Please also see the thread I refer to in the
proposal on linkedin, and you will see customers are also looking for this data.
This has been a painful issue with multiple customers complaining about, and
addressing that would improve the use of HDFS. Please look at section 2 of the
proposal to see a large set of issues that customers have been complaining
about, and how hard some of the workarounds are.
bq. The reporting aspect should perhaps be handled under HDFS-1121.
In order to have a tool that can do disk balancing , the user usually asks a
simple question, "Which machines in my cluster needs rebalancing" since there
is an I/O cost involved in running any disk balancing operation, and may times
you need to do this proactively since there are times when disk go out of
balance (see some comments earlier in this JIRA itself) even without failure.
This proposal says that we will create a metric that describes ideal data
distribution and how far a given node is from that ideal, which will allow us
to compute a set of nodes that will benefit from disk balancing and how the
actual balancing should look like (that is, what are the data movements that we
are planning). I don't think HDFS-1121 is talking about this requirement. I do
think being able to answer "which machines need disk balancing" makes for a
good operational interface for HDFS.
> Re-balance disks within a Datanode
> ----------------------------------
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: datanode
> Reporter: Travis Crawford
> Attachments: disk-balancer-proposal.pdf
>
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations
> where certain disks are full while others are significantly less used. Users
> at many different sites have experienced this issue, and HDFS administrators
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode.
> In write-heavy environments this will still make use of all spindles,
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is
> not needed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)