On Mar 31, 2010, at 10:31 AM, Doug Cutting (JIRA) wrote: >> AvroKeySerialization: I am a bit confused about this class. > > It's used to serialize map outputs and deserialize reduce inputs. The > mapreduce framework uses the job's specified map output key class to find the > serialization implementation it uses to read and write intermediate keys and > values. > Thats too bad that the intermediate files can't use the avro file format, the performance will suffer until that API changes to either allow custom file formats or to support a feature like the decoder's inputStream() method to allow buffering of chained or interleaved readers.
>> Deprecated APIs are used - are the replacements not appropriate or >> insufficient? > > Good question. Hadoop 0.20 deprecated the "old" org.apache.hadoop.mapred > APIs to encourage folks to try the new org.apache.hadoop.mapreduce APIs. > However the org.apache.hadoop.mapreduce APIs are not fully functional in > 0.20, and folks primarily continue to use the org.apache.hadoop.mapred APIs. > 0.20 is used here since it's in Maven repos, but this code should also work > against 0.19 and perhaps even 0.18, and I'd compile against one of those > instead if it were in a Maven repo. FYI, Avro does not work with Hadoop 0.20 for CDH2 or CDH3 (I have not tried plain 0.20) because they include jackson 1.0.1 and you'll get an exception like this: 2010-03-31 11:00:55,616 FATAL org.apache.hadoop.mapred.TaskTracker: Error running child : java.lang.NoSuchMethodError: org.codehaus.jackson.JsonFactory.enable(Lorg/codehaus/jackson/JsonParser$Feature;)Lorg/codehaus/jackson/JsonFactory; at org.apache.avro.Schema.<clinit>(Schema.java:81) at com.rr.avro.ViewEvent.<clinit>(ViewEvent.java:5) at com.rr.eventdata.ViewRecord.<init>(ViewRecord.java:60) at com.rr.eventdata.AvroSerializable.<clinit>(AvroSerializable.java:17) at com.rr.eventdata.AvroFileReader.createClickDatumReader(AvroFileReader.java:50) because mappers/reducers don't live in their own classloader space, so the default hadoop lib directory contents have class load order priority. > > >> hadoop mapreduce support for avro data >> -------------------------------------- >> >> Key: AVRO-493 >> URL: https://issues.apache.org/jira/browse/AVRO-493 >> Project: Avro >> Issue Type: New Feature >> Components: java >> Reporter: Doug Cutting >> Assignee: Doug Cutting >> Attachments: AVRO-493.patch, AVRO-493.patch >> >> >> Avro should provide support for using Hadoop MapReduce over Avro data files. > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. >