[ https://issues.apache.org/jira/browse/CRUNCH-658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16824966#comment-16824966 ]
Steve Loughran commented on CRUNCH-658: --------------------------------------- Commented on CRUNCH-683. If you actually call toString() on an s3a FS instance, the stats it prints out includes the # of http requests made, which helps you assess the real cost of operations > Add a way to skip the getSize checks for Sources from object stores > ------------------------------------------------------------------- > > Key: CRUNCH-658 > URL: https://issues.apache.org/jira/browse/CRUNCH-658 > Project: Crunch > Issue Type: Bug > Components: Core > Affects Versions: 0.14.0 > Reporter: Josh Wills > Assignee: Josh Wills > Priority: Major > > Ran into a problem when using Crunch to process a _lot_ of data from S3: the > getSize checks can be very slow to run and don't materially add much to the > overall processing of a pipeline when things like reducer counts are manually > specified. I'd like to add a way to disable the file size checks, either > globally or for specific input sources. -- This message was sent by Atlassian JIRA (v7.6.3#76005)