Bartłomiej Romański created AVRO-1611:
-----------------------------------------
Summary: Avro-mapred should provide CombineAvroKeyInputFormat
Key: AVRO-1611
URL: https://issues.apache.org/jira/browse/AVRO-1611
Project: Avro
Issue Type: Improvement
Reporter: Bartłomiej Romański
Priority: Minor
Attachments: CombineAvroKeyInputFormat.java
A serious issue with Hadoop is dealing with a huge number of small files (they
slow down processing, overload namenode etc...).
A common remedy for this is to use CombineFileInputFormat. However, this is an
abstract class to need to be conretized for each InputFormat it is wrapping. I
believe Avro should provide CombineAvroKeyInputFormat like Hadoop is providing
CombineSequenceFileInputFormat and CombineTextInputFormat.
I've attached a basic implementation.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)