Ryan Blue created PARQUET-110: --------------------------------- Summary: Some schemas without column projection cause Pig failures Key: PARQUET-110 URL: https://issues.apache.org/jira/browse/PARQUET-110 Project: Parquet Issue Type: Bug Components: parquet-mr Reporter: Ryan Blue
Parquet stores and loads the Pig schema in the Configuration. Along the way, Pig changes that Schema: {code:java} // This schema is converted from Parquet and written in Configuration String schemaStr = "my_list: {array: (array_element: (num1: int,num2: int))}"; // Reparsed using org.apache.pig.impl.util.Utils Schema schema = Utils.getSchemaFromString(schemaStr); // But no longer matches the original structure schema.toString(); // => {my_list: {array_element: (num1: int,num2: int)}} {code} Note that the intermediate bag, named either "bag" or "array", is removed when Pig reparses the Schema. I can work around this to an extent in the Parquet code, but the Pig behavior gets more strange. If there are two of these, the second is preserved but renamed to "bag_0". Something funny is going on there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)