[
https://issues.apache.org/jira/browse/AVRO-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xiangrui Meng updated AVRO-1357:
--------------------------------
Attachment: AVRO-1357.patch
Upload a patch to lang/java/mapred that implements the feature.
> Allow to force reading generic records for input data and map output data
> -------------------------------------------------------------------------
>
> Key: AVRO-1357
> URL: https://issues.apache.org/jira/browse/AVRO-1357
> Project: Avro
> Issue Type: New Feature
> Components: java
> Affects Versions: 1.7.4
> Reporter: Xiangrui Meng
> Attachments: AVRO-1357.patch
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> In AvroJob/AvroInputFormat/AvroRecordReader, we can choose either
> SpecificDatumReader or ReflectDatumReader to read input data and map output
> data, but not GenericDatumReader. We may want to force reading generic
> records for some jobs.
> For example, assume that the input records contain a field called "category"
> and we want to compute the number of records for each category. If we can
> force reading generic records, we can get the category string by calling
> get("category"). Otherwise, the input record might be loaded as a
> GenericRecord instance or a SpecificRecord instance. The latter does not
> implement GenericRecord.
> To add this feature, we can change the booleans
> IS_REFLECT/MAP_OUTPUT_IS_REFLECT into enums called
> INPUT_AVRO_DESERIALIZATION_TYPE/MAP_OUTPUT_AVRO_DESERIALIZATION_TYPE, and
> return the corresponding DatumReader based on the type.
> We can add
> setDeserializationType/setInputDeserializationType/setMapOutputDeserializationType
> to AvroJob while deprecating setReflect/setInputReflect/setMapOutputReflect.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira