[ 
https://issues.apache.org/jira/browse/PIG-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12985972#action_12985972
 ] 

Jakob Homan commented on PIG-1748:
----------------------------------

@Scott
I can't say I'm convinced, and am in fact more concerned from your example, 
given that this approach essentially builds dependencies on all of those 
projects into Avro.  However, this JIRA isn't the best place to discuss this.  
Is there a discussion about this type of integration going on in Avro that the 
community can contribute to?  Is there a JIRA?  Thanks.

> Add load/store function AvroStorage for avro data
> -------------------------------------------------
>
>                 Key: PIG-1748
>                 URL: https://issues.apache.org/jira/browse/PIG-1748
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>            Reporter: lin guo
>            Assignee: Jakob Homan
>         Attachments: avro_storage.patch, avro_test_files.tar.gz, 
> PIG-1748-2.patch
>
>
> We want to use Pig to process arbitrary Avro data and store results as Avro 
> files. AvroStorage() extends two PigFuncs: LoadFunc and StoreFunc. 
> Due to discrepancies of Avro and Pig data models, AvroStorage has:
> 1. Limited support for "record": we do not support recursively defined record 
> because the number of fields in such records is data dependent.
> 2. Limited support for "union": we only accept nullable union like ["null", 
> "some-type"].
> For simplicity, we also make the following assumptions:
> If the input directory is a leaf directory, then we assume Avro data files in 
> it have the same schema;
> If the input directory contains sub-directories, then we assume Avro data 
> files in all sub-directories have the same schema.
> AvroStorage takes no input parameters when used as a LoadFunc (except for 
> "debug [debug-level]"). 
> Users can provide parameters to AvroStorage when used as a StoreFunc. If they 
> don't, Avro schema of output data is derived from its 
> Pig schema.
> Detailed documentation can be found in 
> http://snaprojects.jira.com/wiki/display/HTOOLS/AvroStorage+-+Pig+support+for+Avro+data

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to