[ https://issues.apache.org/jira/browse/HADOOP-4339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649897#action_12649897 ]
Chris Douglas commented on HADOOP-4339: --------------------------------------- In FsShell, it makes more sense to save the length instead of the ContentSummary. The FileSystem change looks good. > Improve FsShell -du/-dus and FileSystem.getContentSummary efficiency > -------------------------------------------------------------------- > > Key: HADOOP-4339 > URL: https://issues.apache.org/jira/browse/HADOOP-4339 > Project: Hadoop Core > Issue Type: Bug > Components: fs > Affects Versions: 0.18.1 > Reporter: David Phillips > Attachments: hadoop-fsshell-du-simple.patch > > > FsShell.du has two inefficiencies: > * calling getContentSummary twice for each top-level item rather than calling > it once and saving the result > * calling getContentSummary for files rather than using the size it already > has in FileStatus > getContentSummary has one: > * calling itself for files rather than using the length it already has in > FileStatus > Every call to getContentSummary results in a call to getFileStatus, which may > be expensive (e.g. NativeS3FileSystem has both network latency and actual > monetary cost). > The simple solution: > * FsShell.du calls once per item and saves the ContentSummary > * FsShell.du uses FileStatus.getLen for files > * getContentSummary only calls itself for directories > Another solution, rather than adding special casing to callers, is to add a > getContentSummary that takes a FileStatus. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.