[
https://issues.apache.org/jira/browse/AVRO-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12926441#action_12926441
]
Garrett Wu commented on AVRO-593:
---------------------------------
I'm also interested in using the newer mapreduce API with Avro, so I'm trying
to write an AvroWritable and some input and output format classes that know how
to deal with the schemas. I should have a patch next week, but the idea is:
- Introduce new classes AvroKey and AvroValue that implement Writable.
- Users can call AvroJob.setInputKeySchema(), AvroJob.setInputValueSchema(),
AvroJob.setMapOutputKeySchema(), AvroJob.setMapOutputValueSchema(),
AvroJob.setReduceOutputKeySchema(), AvroJob.setReduceOutputValueSchema() as
needed.
- Provide AvroContainerFileInputFormat/AvroContainerFileOutputFormat,
AvroSequenceFileInputFormat, AvroSequenceFileOutputFormat that read and write
the schemas for the data appropriately. The schema in the sequence files can
be stored in the header's metadata.
- Users can write Mappers and Reducers as they normally would. Note that this
differs slightly from the org.apache.avro.mapred.* way of doing things -- I
don't plan to supply special AvroMapper and AvroReducer base classes or a new
Serialization, since the AvroKey/AvroValue classes are Writable just like any
other hadoop key/value type.
> Avro mapreduce apis incompatible with hadoop 0.20.2
> ---------------------------------------------------
>
> Key: AVRO-593
> URL: https://issues.apache.org/jira/browse/AVRO-593
> Project: Avro
> Issue Type: Bug
> Components: java
> Affects Versions: 1.3.2, 1.3.3
> Environment: Avro 1.3.3, Hadoop 0.20.2
> Reporter: Steve Severance
>
> The avro api's for hadoop use the hadoop mapreduce api that has been
> deprecated. A new avro mapreduce api should be implemented for hadoop 0.20
> and higher.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.