[ 
https://issues.apache.org/jira/browse/PIG-5400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17059938#comment-17059938
 ] 

Jianyong Dai commented on PIG-5400:
-----------------------------------

+1

> OrcStorage dropping struct(tuple) when it only holds a single field inside a 
> Bag(array)
> ---------------------------------------------------------------------------------------
>
>                 Key: PIG-5400
>                 URL: https://issues.apache.org/jira/browse/PIG-5400
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Minor
>         Attachments: pig-5400-v01.patch
>
>
> I was asked by a user that they were seeing inconsistent schema when stored 
> on OrcStorage. Sample code 
> {code} 
> A = load 'input.txt' as (a0:long); 
> B = GROUP A by a0; 
> STORE B into 'filename' using OrcStorage(); 
> {code} 
> Pig's schema {{B: {group: long,A: bag: { tuple(a0: long)}}}}. 
> Expected Orc schema {{struct<group:bigint,A:array<struct<bigint>>>}} 
> Actual Orc schema {{struct<group:bigint,A:array<bigint>>}} 
> _This only happens when a tuple contains a single field._ 
> Current schema without struct(tuple) is better in saving space but it would 
> be nice to have an option to keep the extra struct(tuple) layer if user 
> expects schema evolution within that tuple by adding more fields in the 
> future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to