[ 
https://issues.apache.org/jira/browse/HDFS-6900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14109954#comment-14109954
 ] 

Colin Patrick McCabe commented on HDFS-6900:
--------------------------------------------

I forget the JIRA numbers, but this has been discussed a bunch already.  Short 
answer is that we can't assume that HDFS is the only thing using the disks, so 
we can't use DF to find out how much space the DataNode is using.

That being said, my experience has been that most users would be absolutely 
happy with df rather than du, because most users don't share their HDFS disks 
with other systems / nodes.  In fact, I even saw a HOWTO online that instructed 
users to symlink {{/usr/bin/du}} to {{/usr/bin/df}} when using Hadoop :(

It would be nice if we could somehow default to df for those people.


> Eliminate DU thread per block pool slice
> ----------------------------------------
>
>                 Key: HDFS-6900
>                 URL: https://issues.apache.org/jira/browse/HDFS-6900
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>    Affects Versions: 2.5.0
>            Reporter: Arpit Agarwal
>
> We use one DU thread per block pool slice to compute disk usage information. 
> In addition to the thread overhead this results in the disk usage information 
> being out of date for up to 10 minutes at a time. We can refresh it more 
> frequently but then we'd be launching a shell command per block pool slice 
> even more often.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to