[ https://issues.apache.org/jira/browse/AVRO-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bartłomiej Romański updated AVRO-1611: -------------------------------------- Attachment: CombineAvroKeyInputFormat.java > Avro-mapred should provide CombineAvroKeyInputFormat > ---------------------------------------------------- > > Key: AVRO-1611 > URL: https://issues.apache.org/jira/browse/AVRO-1611 > Project: Avro > Issue Type: Improvement > Reporter: Bartłomiej Romański > Priority: Minor > Attachments: CombineAvroKeyInputFormat.java > > > A serious issue with Hadoop is dealing with a huge number of small files > (they slow down processing, overload namenode etc...). > A common remedy for this is to use CombineFileInputFormat. However, this is > an abstract class to need to be conretized for each InputFormat it is > wrapping. I believe Avro should provide CombineAvroKeyInputFormat like Hadoop > is providing CombineSequenceFileInputFormat and CombineTextInputFormat. > I've attached a basic implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)