[ https://issues.apache.org/jira/browse/AVRO-534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862738#action_12862738 ]
Harsh J Chouraria commented on AVRO-534: ---------------------------------------- Hello Doug, Could you tell me in simple points how to go about doing that? Not been in Java development for long but am willing to do this :) I see a WordCount test for Avro in trunk, shall I extend that or write a custom one? > AvroRecordReader (org.apache.avro.mapred) should support a JobConf-given > schema > ------------------------------------------------------------------------------- > > Key: AVRO-534 > URL: https://issues.apache.org/jira/browse/AVRO-534 > Project: Avro > Issue Type: Bug > Components: java > Affects Versions: 1.4.0 > Environment: ArchLinux, JAVA 1.6, Apache Hadoop (0.20.2), Apache Avro > (trunk -- 1.4.0 SNAPSHOT), Using Avro Generic API (JAVA) > Reporter: Harsh J Chouraria > Priority: Trivial > Fix For: 1.4.0 > > Attachments: avro.mapreduce.r1.diff > > > Consider an Avro File of a single record type with about 70 fields in the > order (str, str, str, long, str, double, [lets take only first 6 into > consideration] ...). > To pass this into a simple MapReduce job I do: > AvroInputFormat.addInputPath(...) and it works well with an IdentityMapper. > Now I'd like to read only three fields, say fields 0, 1 and 3 so I give the > special schema with my 3 fields as (str (0), str (1), long(2)) using > AvroJob.setInputGeneric(..., mySchema). This leads to a failure of the > mapreduce job since the Avro record reader reads the file for its entire > schema (of 70 fields) and tries to convert my given 'long' field to 'str' as > is at the index 2 of the actual schema (meaning its using the actual schema > embedded into the file, not what I supplied!). > The AvroRecordReader must support reading in the schema specified by the user > using AvroJob.setInputGeneric. > I've written a patch for it to do the same but am not sure if its actually > the solution (MAP_OUTPUT_SCHEMA use?) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.