[ 
https://issues.apache.org/jira/browse/HDFS-15597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17209910#comment-17209910
 ] 

Wei-Chiu Chuang commented on HDFS-15597:
----------------------------------------

Thanks for the patch.

Spent some time to understand this patch.

(1) This is FileContext. The FileSystem#getContentSummary() has the exactly 
same implementation (and thus same bug), but when it is used for HDFS, 
DistributedFileSystem#getContentSummary() overrides it and NameNode provides 
the correct space usage. It is only when FileContext is used we have this bug.

(2) The patch addresses the bug for HDFS. However it will be incorrect for 
HDFS-EC. (replication=0)

> ContentSummary.getSpaceConsumed does not consider replication
> -------------------------------------------------------------
>
>                 Key: HDFS-15597
>                 URL: https://issues.apache.org/jira/browse/HDFS-15597
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 2.6.0
>            Reporter: Ajmal Ahammed
>            Assignee: Aihua Xu
>            Priority: Minor
>         Attachments: HDFS-15597.patch
>
>
> I am trying to get the disk space consumed by an HDFS directory using the 
> {{ContentSummary.getSpaceConsumed}} method. I can't get the space consumption 
> correctly considering the replication factor. The replication factor is 2, 
> and I was expecting twice the size of the actual file size from the above 
> method.
> I can't get the space consumption correctly considering the replication 
> factor. The replication factor is 2, and I was expecting twice the size of 
> the actual file size from the above method.
> {code}
> ubuntu@ubuntu:~/ht$ sudo -u hdfs hdfs dfs -ls /var/lib/ubuntu
> Found 2 items
> -rw-r--r--   2 ubuntu ubuntu    3145728 2020-09-08 09:55 
> /var/lib/ubuntu/size-test
> drwxrwxr-x   - ubuntu ubuntu          0 2020-09-07 06:37 /var/lib/ubuntu/test
> {code}
> But when I run the following code,
> {code}
> String path = "/etc/hadoop/conf/";
> conf.addResource(new Path(path + "core-site.xml"));
> conf.addResource(new Path(path + "hdfs-site.xml"));
> long size = 
> FileContext.getFileContext(conf).util().getContentSummary(fileStatus).getSpaceConsumed();
> System.out.println("Replication : " + fileStatus.getReplication());
> System.out.println("File size : " + size);
> {code}
> The output is
> {code}
> Replication : 0
> File size : 3145728
> {code}
> Both the file size and the replication factor seems to be incorrect.
> /etc/hadoop/conf/hdfs-site.xml contains the following config:
> {code}
>   <property>
>     <name>dfs.replication</name>
>     <value>2</value>
>   </property>
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to