[ 
https://issues.apache.org/jira/browse/AVRO-1699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Mazak updated AVRO-1699:
-----------------------------
    Affects Version/s: 1.7.6
               Status: Patch Available  (was: Open)

Attaching the AutoMap utility we wrote and have been using.

> AutoMap field values between Avro objects with different schemas
> ----------------------------------------------------------------
>
>                 Key: AVRO-1699
>                 URL: https://issues.apache.org/jira/browse/AVRO-1699
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>    Affects Versions: 1.7.6
>            Reporter: Paul Mazak
>
> There are a few use cases for this:
> *Various Avro input data to one common output*
> You want to pickup Avro files in different schemas and normalize into one. 
> You might wish to transform to the superset of the input schemas.
> *Aggregating Raw Data*
> You want to rewrite data grouped by some fields and aggregated.  The output 
> Avro in this case would be a subset of the input Avro, where at least the 
> group by fields are in both input and output schemas.
> *Alternate Views*
> You have Avro data that you want to trim different ways to create subsets 
> that would be useful for views in Hive or exports for SQL tables.
> *Schema Migration*
> You've added fields to a schema and you are storing data in both the old and 
> new schemas.  You have Avro in an old schema and you can't process it with 
> Avro in the new schema (using pig or java map-reduce).  AutoMapping would 
> up-convert your old data by setting null for the new fields added, and all 
> data are in the new schema.  This was 
> [asked|http://stackoverflow.com/questions/27131942/is-it-possible-to-retrieve-schema-from-avro-data-and-use-them-in-mapreduce]
>  about on StackOverflow.
> _Considerations:_
>  * Loop over the source schema fields available to automap over and return 
> any that were unable to be mapped.
>  * Allow mappings between compatible types. For example going from integers 
> to longs, floats to strings, etc.
>  * Field names match case-sensitive.
>  * Make use of aliases in the schema when considering fields to automap.
>  * Deep copy nested structures like arrays and maps



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to