[
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15088505#comment-15088505
]
Andrew Wang commented on HDFS-1312:
-----------------------------------
Hi Anu, thanks for the reply,
bq. Our own experience from the field is that many customers routinely run into
this issue, with and without new drives being added, so we are tackling both
use cases.
Have these customers tried out HDFS-1804? We've had users complain about
imbalance before too, and after enabling HDFS-1804, no further issues. This is
why I'm trying to separate the two usecases; heterogeneous disks are better
addressed by HDFS-1804 since it works automatically, leaving hotswap to this
JIRA (HDFS-1312).
If you're interested, feel free to take HDFS-8538 from me. I really think it'll
fix the majority of imbalance issues outside of hotswap.
bq. (self-quote) I think most of this functionality should live in the DN since
it's better equipped to do IO throttling and mutual exclusion.
I think I was too brief before, let me expand a little. I imagine basically
everything (discover, planning, execute) happening in the DN:
# Client sends RPC to DN telling it to balance with some parameters
# DN examines its volumes, constructs some {{Info}} object to hold the data
# DN thread calls Planner passing the {{Info}} object, which outputs a {{Plan}}
# {{Plan}} is queued at DN executor pool, which does the moves
Attempting to address your points one by one:
# When I mentioned removing the discover phase, I meant the NN communication.
Here, the DN just probes its own volume information. Does it need to talk to
the NN for anything else?
# Assuming no need for NN communication, there's no new code. The new RPCs
would be one to start balancing and one to monitor ongoing balancing, the rest
of the communication happens between DN threads.
# Cluster-wide disk information is already handled by monitoring tools, no? The
admin gets the Ganglia alert saying some node is imbalanced, admin triggers
intranode balancer, admin keeps looking at Ganglia to see if it's fixed. I
don't think adding our own monitoring of the same information helps, when
Ganglia etc. are already available, in-use, and understood by admins.
# I don't think this conflicts with the debuggability goal. The DN can dump the
{{Info}} object (and even the {{Plan}}) object) if requested, to the log or
somewhere in a data dir. Then we can pass it into a unit test to debug. This
unit test doesn't need to be a minicluster either, if we write the Planner
correctly the {{Info}} should encapsulate all the state and we're just running
an algorithm to make a {{Plan}}. The planner being inside the DN doesn't change
this.
Thanks for the pointer to the mover logic, wasn't aware we had that. I asked
about this since the proposal doc in 4.2 says "copy block from A to B and
verify". Adding a note that says "we use the existing moveBlockAcrossStorage
method" is a great answer.
> Re-balance disks within a Datanode
> ----------------------------------
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: datanode
> Reporter: Travis Crawford
> Assignee: Anu Engineer
> Attachments: Architecture_and_testplan.pdf, disk-balancer-proposal.pdf
>
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations
> where certain disks are full while others are significantly less used. Users
> at many different sites have experienced this issue, and HDFS administrators
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode.
> In write-heavy environments this will still make use of all spindles,
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is
> not needed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)