Scott Carey wrote:
Thats too bad that the intermediate files can't use the avro file format, the 
performance will suffer until that API changes to either allow custom file 
formats or to support a feature like the decoder's inputStream() method to 
allow buffering of chained or interleaved readers.

The intermediate files are part of the mapreduce kernel. The buffering, sorting, transmission and merging of this data is a critical part of mapreduce. So I don't think it is as simple as just permitting a pluggable file format.

FYI, Avro does not work with Hadoop 0.20 for CDH2 or CDH3 (I have not tried 
plain 0.20) because they include jackson 1.0.1 and you'll get an exception like 
this:

Can't one update the version of Jackson in one's Hadoop cluster to fix this? However that might not work with Amazon's Electric MapReduce, where you don't get to update the cluster (which runs Hadoop 0.18).

Should we avoid using org.codehaus.jackson.JsonFactory.enable() to make Avro compatible with older versions of Jackson?

Doug

Reply via email to