A less expensive way to figure out directory size -------------------------------------------------
Key: HDFS-1658 URL: https://issues.apache.org/jira/browse/HDFS-1658 Project: Hadoop HDFS Issue Type: Improvement Reporter: Hairong Kuang Assignee: Hairong Kuang Currently in order to figure out a directory size, we have to list a directory by calling RPC getListing and counts its child size. This is an expensive operation if a directory is huge. On the other hand when fetching the status of a path (i.e. calling RPC getFileInfo), the length field of FileStatus is set to be 0 if the path is a directory. I am thinking to change this field (FileStatus#length) to be the directory size when the path is a directory. So we can call getFileInfo to get the directory size. This call is much less expensive and simpler than getListing. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira