Re: [jira] Updated: (AVRO-493) hadoop mapreduce support for avro data

Doug Cutting Wed, 31 Mar 2010 12:59:13 -0700

Scott Carey wrote:

Thats too bad that the intermediate files can't use the avro file format, the 
performance will suffer until that API changes to either allow custom file 
formats or to support a feature like the decoder's inputStream() method to 
allow buffering of chained or interleaved readers.

The intermediate files are part of the mapreduce kernel. The buffering,sorting, transmission and merging of this data is a critical part ofmapreduce. So I don't think it is as simple as just permitting apluggable file format.

FYI, Avro does not work with Hadoop 0.20 for CDH2 or CDH3 (I have not tried 
plain 0.20) because they include jackson 1.0.1 and you'll get an exception like 
this:

Can't one update the version of Jackson in one's Hadoop cluster to fixthis? However that might not work with Amazon's Electric MapReduce,where you don't get to update the cluster (which runs Hadoop 0.18).

Should we avoid using org.codehaus.jackson.JsonFactory.enable() to makeAvro compatible with older versions of Jackson?


Doug

Re: [jira] Updated: (AVRO-493) hadoop mapreduce support for avro data

Reply via email to