[ 
https://issues.apache.org/jira/browse/HDFS-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019970#comment-13019970
 ] 

Sanjay Radia commented on HDFS-1658:
------------------------------------

Having the length be number of directory entries makes sense.
However, it is an incompatible (semantic) change and may break some 
applications. I don't have a good sense of which, if any, applications it may 
break.


BTW I am planning to add a new method for FileContext that returns 
FileStatusInfo (see HADOOP-7018). 
There I was hoping to make the length be the number of directory entries. 
Perhaps we could add the same new method to FileSystem.
I will probably have a patch out for review probably next week.




> A less expensive way to figure out directory size
> -------------------------------------------------
>
>                 Key: HDFS-1658
>                 URL: https://issues.apache.org/jira/browse/HDFS-1658
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>
> Currently in order to figure out a directory size, we have to list a 
> directory by calling RPC getListing and get the number of its children. This 
> is an expensive operation especially when a directory has many children 
> because it may require multiple RPCs.
> On the other hand when fetching the status of a path (i.e. calling RPC 
> getFileInfo), the length field of FileStatus is set to be 0 if the path is a 
> directory.
> I am thinking to change this field (FileStatus#length) to be the directory 
> size when the path is a directory. So we can call getFileInfo to get the 
> directory size. This call is much less expensive and simpler than getListing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to