[ https://issues.apache.org/jira/browse/CRUNCH-683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823199#comment-16823199 ]
Jon Hemphill edited comment on CRUNCH-683 at 4/22/19 4:10 PM: -------------------------------------------------------------- FYI, I will be submitting a PR for this shortly. This is related to CRUNCH-658. Although it doesn't skip the getSize check as suggested there, it does improve the performance enough that the symptoms noticed there may be acceptable. was (Author: jonhemphill): FYI, I will be submitting a PR for this shortly. > Avoid unnecessary listStatus calls from getSize computation > ----------------------------------------------------------- > > Key: CRUNCH-683 > URL: https://issues.apache.org/jira/browse/CRUNCH-683 > Project: Crunch > Issue Type: Improvement > Components: Core > Affects Versions: 0.14.0 > Reporter: Jon Hemphill > Assignee: Josh Wills > Priority: Major > > The getPathSize computation in SourceTargetHelper currently makes unnecessary > listStatus calls when recursing over a directory, which can cause performance > issues when the filesystem is an object store such as S3. The performance can > be improved with the addition of a private method to use for the getPathSize > recursion that takes a known FIleStatus object as a parameter. -- This message was sent by Atlassian JIRA (v7.6.3#76005)