Bartłomiej Romański created AVRO-1611:
-----------------------------------------

             Summary: Avro-mapred should provide CombineAvroKeyInputFormat
                 Key: AVRO-1611
                 URL: https://issues.apache.org/jira/browse/AVRO-1611
             Project: Avro
          Issue Type: Improvement
            Reporter: Bartłomiej Romański
            Priority: Minor
         Attachments: CombineAvroKeyInputFormat.java

A serious issue with Hadoop is dealing with a huge number of small files (they 
slow down processing, overload namenode etc...). 

A common remedy for this is to use CombineFileInputFormat. However, this is an 
abstract class to need to be conretized for each InputFormat it is wrapping. I 
believe Avro should provide CombineAvroKeyInputFormat like Hadoop is providing 
CombineSequenceFileInputFormat and CombineTextInputFormat.

I've attached a basic implementation.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to