[
https://issues.apache.org/jira/browse/AVRO-534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862738#action_12862738
]
Harsh J Chouraria commented on AVRO-534:
----------------------------------------
Hello Doug,
Could you tell me in simple points how to go about doing that? Not been in Java
development for long but am willing to do this :)
I see a WordCount test for Avro in trunk, shall I extend that or write a custom
one?
> AvroRecordReader (org.apache.avro.mapred) should support a JobConf-given
> schema
> -------------------------------------------------------------------------------
>
> Key: AVRO-534
> URL: https://issues.apache.org/jira/browse/AVRO-534
> Project: Avro
> Issue Type: Bug
> Components: java
> Affects Versions: 1.4.0
> Environment: ArchLinux, JAVA 1.6, Apache Hadoop (0.20.2), Apache Avro
> (trunk -- 1.4.0 SNAPSHOT), Using Avro Generic API (JAVA)
> Reporter: Harsh J Chouraria
> Priority: Trivial
> Fix For: 1.4.0
>
> Attachments: avro.mapreduce.r1.diff
>
>
> Consider an Avro File of a single record type with about 70 fields in the
> order (str, str, str, long, str, double, [lets take only first 6 into
> consideration] ...).
> To pass this into a simple MapReduce job I do:
> AvroInputFormat.addInputPath(...) and it works well with an IdentityMapper.
> Now I'd like to read only three fields, say fields 0, 1 and 3 so I give the
> special schema with my 3 fields as (str (0), str (1), long(2)) using
> AvroJob.setInputGeneric(..., mySchema). This leads to a failure of the
> mapreduce job since the Avro record reader reads the file for its entire
> schema (of 70 fields) and tries to convert my given 'long' field to 'str' as
> is at the index 2 of the actual schema (meaning its using the actual schema
> embedded into the file, not what I supplied!).
> The AvroRecordReader must support reading in the schema specified by the user
> using AvroJob.setInputGeneric.
> I've written a patch for it to do the same but am not sure if its actually
> the solution (MAP_OUTPUT_SCHEMA use?)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.