[ https://issues.apache.org/jira/browse/CRUNCH-165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joseph Adler updated CRUNCH-165: -------------------------------- Attachment: CRUNCH-165.patch Here is my patch to Crunch to use CombineFileInputFormat. > Pipelines should automatically use CombineFileInputFormat where input > consists of many small files > -------------------------------------------------------------------------------------------------- > > Key: CRUNCH-165 > URL: https://issues.apache.org/jira/browse/CRUNCH-165 > Project: Crunch > Issue Type: Improvement > Components: Core > Affects Versions: 0.4.0 > Reporter: Dave Beech > Assignee: Josh Wills > Attachments: CRUNCH-165.patch > > > Hive had a feature introduced in HIVE-74 whereby CombineFileInputFormat would > be used if the input data consisted of many small files, making the resulting > mapreduce jobs more efficient by giving individual mappers more data to > process. This would be a nice feature for Crunch to have, too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira