[ 
https://issues.apache.org/jira/browse/PARQUET-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14153786#comment-14153786
 ] 

Julien Le Dem commented on PARQUET-110:
---------------------------------------

[~rdblue] I think we should open a PIG jira

> Some schemas without column projection cause Pig failures
> ---------------------------------------------------------
>
>                 Key: PARQUET-110
>                 URL: https://issues.apache.org/jira/browse/PARQUET-110
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>            Reporter: Ryan Blue
>
> Parquet stores and loads the Pig schema in the Configuration. Along the way, 
> Pig changes that Schema:
> {code:java}
> // This schema is converted from Parquet and written in Configuration
> String schemaStr = "my_list: {array: (array_element: (num1: int,num2: int))}";
> // Reparsed using org.apache.pig.impl.util.Utils
> Schema schema = Utils.getSchemaFromString(schemaStr);
> // But no longer matches the original structure
> schema.toString();
> // => {my_list: {array_element: (num1: int,num2: int)}}
> {code}
> Note that the intermediate bag, named either "bag" or "array", is removed 
> when Pig reparses the Schema. I can work around this to an extent in the 
> Parquet code, but the Pig behavior gets more strange. If there are two of 
> these, the second is preserved but renamed to "bag_0". Something funny is 
> going on there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to