[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

Anu Engineer (JIRA) Fri, 08 Jan 2016 14:35:37 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15090085#comment-15090085
 ]


Anu Engineer commented on HDFS-1312:
------------------------------------

Hi [~andrew.wang],

Thanks for you comments.  Here are my thoughts on these issues.

bq. I don't follow this line of reasoning; don't concerns about using a new 
feature apply to a hypothetical HDFS-1312 implementation too?

I think it is related to the risk. Let us look at the worst case scenarios 
possible with HDFS-1804 and HDFS-1312. With HDFS-1804 it is a cluster wide 
change, it is always on and any write will always go thru it. HDFS-1804 thus 
can have a cluster wide impact including impact on various workloads in the 
cluster.

However with HDFS-1312, the worst case is that we will take a node off-line. 
Since it is external tool that operates off-line on a node. Another important 
difference is that it is not always on, it works and goes away. So the amount 
of risk to the cluster, especially from an administrators point of view is 
different with these 2 approaches.

bq. Why do we lose this? Can't the DN dump this somewhere?

we can , but then we need to add RPCs in datanode to pull out that data and 
display the change in the node, whereas in the current approach it is something 
that we write to the local disk and then compute the diff later against the 
sources. We don't need a datanode operation.

bq. This is an interesting point I was not aware of. Is the goal here to do 
inter-DN moving? 
No, the goal is *intra-DN*, I was referring to {noformat} hdfs mover {noformat} 
not to {noformat} hdfs balancer{noformat} 

bq. If it's only for intra-DN moving, then it could still live in the DN.

Completely agree, all block moving code will be in DN.  

bq. This is also why I brought up HDFS-8538. If HDFS-1804 is the default volume 
choosing policy, we won't see imbalance outside of hotswap.

Agree, and it is a goal that we should work towards. From the comments in 
HDFS-8538, it looks like we might have to make some minor tweaks to that before 
we can commit it. I can look at it after HDFS-1312.

bq. The point I was trying to make is that HDFS-1804 addresses the imbalance 
issues besides hotswap, so we eliminate the alerts in the first place. Hotswap 
is an operation explictly undertaken by the admin, so the admin will know to 
also run the intra-DN balancer.

Since we both have made this point many times, I am going to agree with what 
you are saying. Even if we assume that hotswap or normal swap is the only use 
case for disk balancing, in a large cluster many disks would have failed. So if 
a cluster gets a number of disks replaced the current interface would make 
admins life easier. The admins can replace a bunch of disks on various machines 
and ask the system to find and fix those nodes. I just think the interface we 
are building makes the life of admins easier, and takes nothing away from the 
use cases described by you.

bq. This is an aspirational goal, but when debugging a prod cluster we almost 
certainly also want to see the DN log too

Right now, we have actually met the aspirational goal, we capture the snapshot 
of the node and that allows us to both debug and simulate what is happening 
with disk-balancer off-line.

bq. Would it help to have a phone call about this? We have a lot of points 
flying around, might be easier to settle this via a higher-bandwidth medium.

I think that is an excellent idea, would love to chat with you in person. I 
will setup a meeting and post the meeting info in this JIRA.

I really appreciate your inputs and thoughtful discussion we are having, hope 
to speak to you in person soon.

> Re-balance disks within a Datanode
> ----------------------------------
>
>                 Key: HDFS-1312
>                 URL: https://issues.apache.org/jira/browse/HDFS-1312
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode
>            Reporter: Travis Crawford
>            Assignee: Anu Engineer
>         Attachments: Architecture_and_testplan.pdf, disk-balancer-proposal.pdf
>
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

Reply via email to