On Mar 31, 2010, at 10:31 AM, Doug Cutting (JIRA) wrote:
>> AvroKeySerialization: I am a bit confused about this class.
> 
> It's used to serialize map outputs and deserialize reduce inputs.  The 
> mapreduce framework uses the job's specified map output key class to find the 
> serialization implementation it uses to read and write intermediate keys and 
> values.
> 
Thats too bad that the intermediate files can't use the avro file format, the 
performance will suffer until that API changes to either allow custom file 
formats or to support a feature like the decoder's inputStream() method to 
allow buffering of chained or interleaved readers.

>> Deprecated APIs are used - are the replacements not appropriate or 
>> insufficient?
> 
> Good question.  Hadoop 0.20 deprecated the "old" org.apache.hadoop.mapred 
> APIs to encourage folks to try the new org.apache.hadoop.mapreduce APIs.  
> However the org.apache.hadoop.mapreduce APIs are not fully functional in 
> 0.20, and folks primarily continue to use the org.apache.hadoop.mapred APIs.  
> 0.20 is used here since it's in Maven repos, but this code should also work 
> against 0.19 and perhaps even 0.18, and I'd compile against one of those 
> instead if it were in a Maven repo.

FYI, Avro does not work with Hadoop 0.20 for CDH2 or CDH3 (I have not tried 
plain 0.20) because they include jackson 1.0.1 and you'll get an exception like 
this:

2010-03-31 11:00:55,616 FATAL org.apache.hadoop.mapred.TaskTracker: Error 
running child : java.lang.NoSuchMethodError: 
org.codehaus.jackson.JsonFactory.enable(Lorg/codehaus/jackson/JsonParser$Feature;)Lorg/codehaus/jackson/JsonFactory;
        at org.apache.avro.Schema.<clinit>(Schema.java:81)
        at com.rr.avro.ViewEvent.<clinit>(ViewEvent.java:5)
        at com.rr.eventdata.ViewRecord.<init>(ViewRecord.java:60)
        at com.rr.eventdata.AvroSerializable.<clinit>(AvroSerializable.java:17)
        at 
com.rr.eventdata.AvroFileReader.createClickDatumReader(AvroFileReader.java:50)

because mappers/reducers don't live in their own classloader space, so the 
default hadoop lib directory contents have class load order priority.



> 
> 
>> hadoop mapreduce support for avro data
>> --------------------------------------
>> 
>>                Key: AVRO-493
>>                URL: https://issues.apache.org/jira/browse/AVRO-493
>>            Project: Avro
>>         Issue Type: New Feature
>>         Components: java
>>           Reporter: Doug Cutting
>>           Assignee: Doug Cutting
>>        Attachments: AVRO-493.patch, AVRO-493.patch
>> 
>> 
>> Avro should provide support for using Hadoop MapReduce over Avro data files.
> 
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
> 

Reply via email to