[ 
https://issues.apache.org/jira/browse/HDFS-8986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15396230#comment-15396230
 ] 

Xiao Chen commented on HDFS-8986:
---------------------------------

Thanks a lot [~jojochuang] for the review! New patch attached with comments 
inline:
bq. in ContentSummary.java, the name of setter method for snapshotLength, 
snapshotFileCount, snapshotDirectoryCount and snapshotSpaceConsumed should be 
prefixed by "set". E.g. setSnapshotLength
I agree setXXX is a better setter name. The reason in these names here is for 
consistency with existing setter method naming. It's a public (though evolving) 
API, so I'd want to keep the change minimal.
bq. in ContentSummary#equals(), you may declare a ContentSummary object and 
typecast the to object to it, so as to avoid explicitly typecasting every 
method call. This is just a personal taste, not big deal though.
Good idea, updated.
bq. Please update FileSystemShell.md to include the -x option for the usage of 
du.
Good catch! Updated.
bq. I don't understand this code in INodeDirectory, and I wonder if it has a 
bug. If I understand it correctly, the counts field and snapshotCounts field of 
summary object will be exactly the same. On the contrary, I think you may have 
to declare another method similar to 
DirectoryWithSnapshotFeature.computeContentSummary4Snapshot, but which computes 
content for snapshottable subdirectories and files only.
I think current patch is correct. It's a bit difficult to read through, since 
(the great change) of HDFS-4995. But the high level idea is that, 
{{ContentCounts}} is aggregated calculation. You're right in that the 
calculation in {{INodeDirectory#computeContentSummary}} would aggregate same 
values into {{counts}} and {{snapshotCounts}}, but that's what we want. This 
way, in the final calculation in {{FsUsage$Du#processPath}} we can exclude the 
snapshot portion from the calculation by (All - snapshotAll). 
I added 1 more step in the test to create a file as well, after snapshot taken. 
Makes sense?

> Add option to -du to calculate directory space usage excluding snapshots
> ------------------------------------------------------------------------
>
>                 Key: HDFS-8986
>                 URL: https://issues.apache.org/jira/browse/HDFS-8986
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: snapshots
>            Reporter: Gautam Gopalakrishnan
>            Assignee: Xiao Chen
>         Attachments: HDFS-8986.01.patch, HDFS-8986.02.patch
>
>
> When running {{hadoop fs -du}} on a snapshotted directory (or one of its 
> children), the report includes space consumed by blocks that are only present 
> in the snapshots. This is confusing for end users.
> {noformat}
> $  hadoop fs -du -h -s /tmp/parent /tmp/parent/*
> 799.7 M  2.3 G  /tmp/parent
> 799.7 M  2.3 G  /tmp/parent/sub1
> $ hdfs dfs -createSnapshot /tmp/parent snap1
> Created snapshot /tmp/parent/.snapshot/snap1
> $ hadoop fs -rm -skipTrash /tmp/parent/sub1/*
> ...
> $ hadoop fs -du -h -s /tmp/parent /tmp/parent/*
> 799.7 M  2.3 G  /tmp/parent
> 799.7 M  2.3 G  /tmp/parent/sub1
> $ hdfs dfs -deleteSnapshot /tmp/parent snap1
> $ hadoop fs -du -h -s /tmp/parent /tmp/parent/*
> 0  0  /tmp/parent
> 0  0  /tmp/parent/sub1
> {noformat}
> It would be helpful if we had a flag, say -X, to exclude any snapshot related 
> disk usage in the output



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to