AvroRecordReader (org.apache.avro.mapred) should support a JobConf-given schema
-------------------------------------------------------------------------------

                 Key: AVRO-534
                 URL: https://issues.apache.org/jira/browse/AVRO-534
             Project: Avro
          Issue Type: Bug
          Components: java
    Affects Versions: 1.4.0
         Environment: ArchLinux, JAVA 1.6, Apache Hadoop (0.20.2), Apache Avro 
(trunk -- 1.4.0 SNAPSHOT), Using Avro Generic API (JAVA)
            Reporter: Harsh J Chouraria
            Priority: Trivial
             Fix For: 1.4.0


Consider an Avro File of a single record type with about 70 fields in the order 
(str, str, str, long, str, double, [lets take only first 6 into consideration] 
...).
To pass this into a simple MapReduce job I do: 
AvroInputFormat.addInputPath(...) and it works well with an IdentityMapper.

Now I'd like to read only three fields, say fields 0, 1 and 3 so I give the 
special schema with my 3 fields as (str (0), str (1), long(2)) using 
AvroJob.setInputGeneric(..., mySchema). This leads to a failure of the 
mapreduce job since the Avro record reader reads the file for its entire schema 
(of 70 fields) and tries to convert my given 'long' field to 'str' as is at the 
index 2 of the actual schema (meaning its using the actual schema embedded into 
the file, not what I supplied!).

The AvroRecordReader must support reading in the schema specified by the user 
using AvroJob.setInputGeneric.

I've written a patch for it to do the same but am not sure if its actually the 
solution (MAP_OUTPUT_SCHEMA use?)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to