[ https://issues.apache.org/jira/browse/CRUNCH-683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823308#comment-16823308 ]
Jon Hemphill commented on CRUNCH-683: ------------------------------------- PR is available here: [https://github.com/apache/crunch/pull/23] > Avoid unnecessary listStatus calls from getSize computation > ----------------------------------------------------------- > > Key: CRUNCH-683 > URL: https://issues.apache.org/jira/browse/CRUNCH-683 > Project: Crunch > Issue Type: Improvement > Components: Core > Affects Versions: 0.14.0 > Reporter: Jon Hemphill > Assignee: Josh Wills > Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > The getPathSize computation in SourceTargetHelper currently makes unnecessary > listStatus calls when recursing over a directory, which can cause performance > issues when the filesystem is an object store such as S3. The performance can > be improved with the addition of a private method to use for the getPathSize > recursion that takes a known FIleStatus object as a parameter. -- This message was sent by Atlassian JIRA (v7.6.3#76005)