[ 
https://issues.apache.org/jira/browse/HADOOP-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14172743#comment-14172743
 ] 

Byron Wong commented on HADOOP-6857:
------------------------------------

*Scenario 1*: "/test" is a snapshottable directory with a file "a" that has 41 
bytes, replication factor 3.
We run {{hadoop fs -du /test}}:
{code}
41  123  /test/a
{code}
which is consistent with what we get when we run {{hadoop fs -du -s /test}}:
{code}
41  123  /test
{code}
When we create a snapshot "ss1" and rerun the -du commands again, we still get 
the same results as seen above.

Let's say we now run {{hadoop fs -mv /test/a /test/b}}.
Now, when we run {{hadoop fs -du /test}}, we get:
{code}
41  123  /test/b
{code}
which is inconsistent with what we see when we run {{hadoop fs -du -s /test}}:
{code}
41  246  /test
{code}

If we report this process again (i.e. create snapshot, rename /test/b to 
/test/a), we get more and more deviations between the 2 commands.

> FsShell should report raw disk usage including replication factor
> -----------------------------------------------------------------
>
>                 Key: HADOOP-6857
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6857
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Alex Kozlov
>            Assignee: Byron Wong
>         Attachments: HADOOP-6857.patch, show-space-consumed.txt
>
>
> Currently FsShell report HDFS usage with "hadoop fs -dus <path>" command.  
> Since replication level is per file level, it would be nice to add raw disk 
> usage including the replication factor (maybe "hadoop fs -dus -raw <path>"?). 
>  This will allow to assess resource usage more accurately.  -- Alex K



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to