[
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15090085#comment-15090085
]
Anu Engineer commented on HDFS-1312:
------------------------------------
Hi [~andrew.wang],
Thanks for you comments. Here are my thoughts on these issues.
bq. I don't follow this line of reasoning; don't concerns about using a new
feature apply to a hypothetical HDFS-1312 implementation too?
I think it is related to the risk. Let us look at the worst case scenarios
possible with HDFS-1804 and HDFS-1312. With HDFS-1804 it is a cluster wide
change, it is always on and any write will always go thru it. HDFS-1804 thus
can have a cluster wide impact including impact on various workloads in the
cluster.
However with HDFS-1312, the worst case is that we will take a node off-line.
Since it is external tool that operates off-line on a node. Another important
difference is that it is not always on, it works and goes away. So the amount
of risk to the cluster, especially from an administrators point of view is
different with these 2 approaches.
bq. Why do we lose this? Can't the DN dump this somewhere?
we can , but then we need to add RPCs in datanode to pull out that data and
display the change in the node, whereas in the current approach it is something
that we write to the local disk and then compute the diff later against the
sources. We don't need a datanode operation.
bq. This is an interesting point I was not aware of. Is the goal here to do
inter-DN moving?
No, the goal is *intra-DN*, I was referring to {noformat} hdfs mover {noformat}
not to {noformat} hdfs balancer{noformat}
bq. If it's only for intra-DN moving, then it could still live in the DN.
Completely agree, all block moving code will be in DN.
bq. This is also why I brought up HDFS-8538. If HDFS-1804 is the default volume
choosing policy, we won't see imbalance outside of hotswap.
Agree, and it is a goal that we should work towards. From the comments in
HDFS-8538, it looks like we might have to make some minor tweaks to that before
we can commit it. I can look at it after HDFS-1312.
bq. The point I was trying to make is that HDFS-1804 addresses the imbalance
issues besides hotswap, so we eliminate the alerts in the first place. Hotswap
is an operation explictly undertaken by the admin, so the admin will know to
also run the intra-DN balancer.
Since we both have made this point many times, I am going to agree with what
you are saying. Even if we assume that hotswap or normal swap is the only use
case for disk balancing, in a large cluster many disks would have failed. So if
a cluster gets a number of disks replaced the current interface would make
admins life easier. The admins can replace a bunch of disks on various machines
and ask the system to find and fix those nodes. I just think the interface we
are building makes the life of admins easier, and takes nothing away from the
use cases described by you.
bq. This is an aspirational goal, but when debugging a prod cluster we almost
certainly also want to see the DN log too
Right now, we have actually met the aspirational goal, we capture the snapshot
of the node and that allows us to both debug and simulate what is happening
with disk-balancer off-line.
bq. Would it help to have a phone call about this? We have a lot of points
flying around, might be easier to settle this via a higher-bandwidth medium.
I think that is an excellent idea, would love to chat with you in person. I
will setup a meeting and post the meeting info in this JIRA.
I really appreciate your inputs and thoughtful discussion we are having, hope
to speak to you in person soon.
> Re-balance disks within a Datanode
> ----------------------------------
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: datanode
> Reporter: Travis Crawford
> Assignee: Anu Engineer
> Attachments: Architecture_and_testplan.pdf, disk-balancer-proposal.pdf
>
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations
> where certain disks are full while others are significantly less used. Users
> at many different sites have experienced this issue, and HDFS administrators
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode.
> In write-heavy environments this will still make use of all spindles,
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is
> not needed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)