[
https://issues.apache.org/jira/browse/HDFS-10531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15631433#comment-15631433
]
Andrew Wang commented on HDFS-10531:
------------------------------------
Thanks for working on this Wei-Chiu. I have some code review comments, but
would like to start by asking about the usecase. We already have {{hadoop fs
-count -q}} which shows us usage by storage type. If we want to do this for EC
too, I'd prefer we add functionality to the {{Count}} command.
If instead we want some way of printing out the EC policy set on paths, that
seems like it belongs in {{ls}} or even {{hdfs erasurecode}}.
Since EC doesn't relate to quota though, I think normally admins will be the
ones who care about how much data is EC, and at a cluster level. Cluster-wide
stats are thus a better place for these numbers.
Code comments:
* In ContentSummaryComputationContext, let's try to follow the recommended
modifier order:
http://cr.openjdk.java.net/~alundblad/styleguide/index-v6.html#toc-modifiers
* Should we say "redundancy" rather than "replication" in the docs / help? e.g.
{{disk_space_consumed_with_all_replicas}} and {{Display storage and replication
policy}}.
* Commented new code in INodeFile
* I think it'd look better to put the new columns at the end. This is less
likely to break parsing scripts too, since they normally parse left-to-right.
Since this output is tab-delimited, we also don't need brackets.
* Could you add a test for "-s" with "-t" ?
> Add EC policy and storage policy related usage summarization function to dfs
> du command
> ---------------------------------------------------------------------------------------
>
> Key: HDFS-10531
> URL: https://issues.apache.org/jira/browse/HDFS-10531
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Affects Versions: 3.0.0-alpha1
> Reporter: Rui Gao
> Assignee: Wei-Chiu Chuang
> Attachments: HDFS-10531.001.patch
>
>
> Currently du command output:
> {code}
> [ ~]$ hdfs dfs -du -h /home/rgao/
> 0 /home/rgao/.Trash
> 0 /home/rgao/.staging
> 100 M /home/rgao/ds
> 250 M /home/rgao/ds-2
> 200 M /home/rgao/noECBackup-ds
> 500 M /home/rgao/noECBackup-ds-2
> {code}
> For hdfs users and administrators, EC policy and storage policy related usage
> summarization would be very helpful when managing storages of cluster. The
> imitate output of du could be like the following.
> {code}
> [ ~]$ hdfs dfs -du -h -t( total, parameter to be added) /home/rgao
>
> 0 /home/rgao/.Trash
> 0 /home/rgao/.staging
> [Archive] [EC:RS-DEFAULT-6-3-64k] 100 M /home/rgao/ds
> [DISK] [EC:RS-DEFAULT-6-3-64k] 250 M /home/rgao/ds-2
> [DISK] [Replica] 200 M /home/rgao/noECBackup-ds
> [DISK] [Replica] 500 M /home/rgao/noECBackup-ds-2
>
> Total:
>
> [Archive][EC:RS-DEFAULT-6-3-64k] 100 M
> [Archive][Replica] 0 M
> [DISK] [EC:RS-DEFAULT-6-3-64k] 250 M
> [DISK] [Replica] 700 M
>
> [Archive][ALL] 100M
> [DISK] [ALL] 950M
> [ALL] [EC:RS-DEFAULT-6-3-64k] 350M
> [ALL] [Replica] 700M
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]