[ 
https://issues.apache.org/jira/browse/HDFS-10531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15965223#comment-15965223
 ] 

Andrew Wang commented on HDFS-10531:
------------------------------------

Hi Sammi, thanks for the comment,

bq. From user's point of view, put the function in "ls" is better than put in 
"ec" function. Because "ls" has already has the column to show file replication 
factor. EC is one of file replication scheme. So it's natural to show file's EC 
policy here. However it will make the "ec -getPolicy" sub-function a little bit 
redundant.

Since {{ls}} output is probably very commonly parsed by end users, we should be 
careful about changing it. IMO we should add a new flag to also display the EC 
policy.

bq. Cluster wide stats is helpful. And if consider multi-tenant cluster 
environment, per directory stats will also be helpful. So have EC policy 
summary in "du" command can help user.

I liked "count" better since "du" is expected to behave like the Unix "du" 
command. It's also likely that there are users parsing "du" output, whereas 
"count" is something HDFS-specific that we can more easily extend.

bq. As for this JIRA, since EC file is no different from 3-way replication file 
from quotation point of view, it's not clear user can benefit what from knowing 
how many quotas used by each type of EC policy. So I will not recommend add 
"EC" information in "hdfs dfs -count" command. 

"count -q" is specific to quotas, since we don't have quotas for EC, I agree 
that it doesn't make sense to add this to "-q", but we could add a new flag to 
display EC usage.

> Add EC policy and storage policy related usage summarization function to dfs 
> du command
> ---------------------------------------------------------------------------------------
>
>                 Key: HDFS-10531
>                 URL: https://issues.apache.org/jira/browse/HDFS-10531
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Rui Gao
>            Assignee: SammiChen
>              Labels: hdfs-ec-3.0-nice-to-have
>         Attachments: HDFS-10531.001.patch
>
>
> Currently du command output:
> {code}
>         [ ~]$ hdfs dfs -du  -h /home/rgao/
>         0      /home/rgao/.Trash
>         0      /home/rgao/.staging
>         100 M  /home/rgao/ds
>         250 M  /home/rgao/ds-2
>         200 M  /home/rgao/noECBackup-ds
>         500 M  /home/rgao/noECBackup-ds-2
> {code}
> For hdfs users and administrators, EC policy and storage policy related usage 
> summarization would be very helpful when managing storages of cluster. The 
> imitate output of du could be like the following.
> {code}
>         [ ~]$ hdfs dfs -du  -h -t( total, parameter to be added) /home/rgao
>          
>         0      /home/rgao/.Trash
>         0      /home/rgao/.staging
>         [Archive] [EC:RS-DEFAULT-6-3-64k] 100 M  /home/rgao/ds
>         [DISK] [EC:RS-DEFAULT-6-3-64k]     250 M  /home/rgao/ds-2
>         [DISK] [Replica]     200 M  /home/rgao/noECBackup-ds
>         [DISK] [Replica]     500 M  /home/rgao/noECBackup-ds-2
>          
>         Total:
>          
>         [Archive][EC:RS-DEFAULT-6-3-64k]  100 M
>         [Archive][Replica]                                0 M
>         [DISK] [EC:RS-DEFAULT-6-3-64k]     250 M
>         [DISK] [Replica]                               700 M  
>      
>         [Archive][ALL]                                 100M
>         [DISK]    [ALL]                                  950M
>         [ALL]     [EC:RS-DEFAULT-6-3-64k]    350M
>         [ALL]     [Replica]                              700M
> {code}     



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to