[
https://issues.apache.org/jira/browse/PIG-1914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008330#comment-13008330
]
Chao Tian commented on PIG-1914:
--------------------------------
Yeah, i agree with you that we have this problem. However, i thought we should
have the assumption that the JSON records in the same data file should have
similar schema. The small difference could be allowed, but they should be
similar, right?
To deal with these small difference, we could define the schema for the loaded
tuple by using the complete set keys. I plan to have two method of loading
schema of the data, 1) User could pass a schema string which indicate the
schema of the loaded data 2) If user pass nothing, the loader would parse the
first line of input data to get the schema. After doing that, the loaded data
would have a schema anyway. This schema should be the complete set of the keys.
If some JSON records do not contain some fileds, they would be left as null in
Pig.
I thought this method could solve our problem. And by this method, we could
also support the columnar filter, which means we just load the desired columns
of JSON data, in future.
> Support load/store JSON data in Pig
> -----------------------------------
>
> Key: PIG-1914
> URL: https://issues.apache.org/jira/browse/PIG-1914
> Project: Pig
> Issue Type: New Feature
> Affects Versions: 0.8.0
> Reporter: Chao Tian
>
> The JSON is a commonly used data storage format. It is popular for storing
> structured data, especially for JavaScript data exchange.
> Pig should have the ability to load/store JSON format data. I plan to write
> one for the piggy bank.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira