[ https://issues.apache.org/jira/browse/CRUNCH-658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823339#comment-16823339 ]
Jon Hemphill commented on CRUNCH-658: ------------------------------------- I created CRUNCH-683 as a related issue that speeds up the getSize computation by an order of magnitude when the source is S3. > Add a way to skip the getSize checks for Sources from object stores > ------------------------------------------------------------------- > > Key: CRUNCH-658 > URL: https://issues.apache.org/jira/browse/CRUNCH-658 > Project: Crunch > Issue Type: Bug > Components: Core > Affects Versions: 0.14.0 > Reporter: Josh Wills > Assignee: Josh Wills > Priority: Major > > Ran into a problem when using Crunch to process a _lot_ of data from S3: the > getSize checks can be very slow to run and don't materially add much to the > overall processing of a pipeline when things like reducer counts are manually > specified. I'd like to add a way to disable the file size checks, either > globally or for specific input sources. -- This message was sent by Atlassian JIRA (v7.6.3#76005)