[
https://issues.apache.org/jira/browse/HDFS-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12999626#comment-12999626
]
Tsz Wo (Nicholas), SZE commented on HDFS-1658:
----------------------------------------------
> Currently in order to figure out a directory size, we have to list a
> directory ...
We may use {{FileSystem.getContentSummary(Path)}} or "fs -count".
> A less expensive way to figure out directory size
> -------------------------------------------------
>
> Key: HDFS-1658
> URL: https://issues.apache.org/jira/browse/HDFS-1658
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Hairong Kuang
> Assignee: Hairong Kuang
>
> Currently in order to figure out a directory size, we have to list a
> directory by calling RPC getListing and counts its child size. This is an
> expensive operation if a directory is huge.
> On the other hand when fetching the status of a path (i.e. calling RPC
> getFileInfo), the length field of FileStatus is set to be 0 if the path is a
> directory.
> I am thinking to change this field (FileStatus#length) to be the directory
> size when the path is a directory. So we can call getFileInfo to get the
> directory size. This call is much less expensive and simpler than getListing.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira