Devon Kozenieski created PARQUET-2069:
-----------------------------------------

             Summary: Parquet file containing arrays, written by Parquet-MR, 
cannot be read again by Parquet-MR
                 Key: PARQUET-2069
                 URL: https://issues.apache.org/jira/browse/PARQUET-2069
             Project: Parquet
          Issue Type: Bug
          Components: parquet-avro
    Affects Versions: 1.12.0
         Environment: Windows 10
            Reporter: Devon Kozenieski
         Attachments: modified.parquet, original.parquet

In the attached files, there is one original file, and one written modified 
file that results after reading the original file and writing it back with 
Parquet-MR, with a few values modified. The schema should not be modified, 
since the schema of the input file is used as the schema to write the output 
file. However, the output file has a slightly modified schema that then cannot 
be read back the same way again with Parquet-MR, resulting in the exception 
message:  java.lang.ClassCastException: optional binary element (STRING) is not 
a group

My guess is that the issue lies in the Avro schema conversion.

The Parquet files attached have some arrays and some nested fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to