Ryan Blue created PARQUET-110:
---------------------------------

             Summary: Some schemas without column projection cause Pig failures
                 Key: PARQUET-110
                 URL: https://issues.apache.org/jira/browse/PARQUET-110
             Project: Parquet
          Issue Type: Bug
          Components: parquet-mr
            Reporter: Ryan Blue


Parquet stores and loads the Pig schema in the Configuration. Along the way, 
Pig changes that Schema:

{code:java}
// This schema is converted from Parquet and written in Configuration
String schemaStr = "my_list: {array: (array_element: (num1: int,num2: int))}";
// Reparsed using org.apache.pig.impl.util.Utils
Schema schema = Utils.getSchemaFromString(schemaStr);
// But no longer matches the original structure
schema.toString();
// => {my_list: {array_element: (num1: int,num2: int)}}
{code}

Note that the intermediate bag, named either "bag" or "array", is removed when 
Pig reparses the Schema. I can work around this to an extent in the Parquet 
code, but the Pig behavior gets more strange. If there are two of these, the 
second is preserved but renamed to "bag_0". Something funny is going on there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to