[ 
https://issues.apache.org/jira/browse/PIG-1914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008330#comment-13008330
 ] 

Chao Tian commented on PIG-1914:
--------------------------------

Yeah, i agree with you that we have this problem. However, i thought we should 
have the assumption that the JSON records in the same data file should have 
similar schema. The small difference could be allowed, but they should be 
similar, right?

To deal with these small difference, we could define the schema for the loaded 
tuple by using the complete set keys. I plan to have two method of loading 
schema of the data, 1) User could pass a schema string which indicate the 
schema of the loaded data 2) If user pass nothing, the loader would parse the 
first line of input data to get the schema.  After doing that, the loaded data 
would have a schema anyway. This schema should be the complete set of the keys. 
If some JSON records do not contain some fileds, they would be left as null in 
Pig. 

I thought this method could solve our problem. And by this method, we could 
also support the columnar filter, which means we just load the desired columns 
of JSON data, in future.


> Support load/store JSON data in Pig
> -----------------------------------
>
>                 Key: PIG-1914
>                 URL: https://issues.apache.org/jira/browse/PIG-1914
>             Project: Pig
>          Issue Type: New Feature
>    Affects Versions: 0.8.0
>            Reporter: Chao Tian
>
> The JSON is a commonly used data storage format. It is popular for storing 
> structured data, especially for JavaScript data exchange. 
> Pig should have the ability to load/store JSON format data. I plan to write 
> one for the piggy bank.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to