[ https://issues.apache.org/jira/browse/PARQUET-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17537604#comment-17537604 ]
Timothy Miller commented on PARQUET-2069: ----------------------------------------- Well, I tried modifying prepareForRead to just reconstruct the avro schema always from the parquet schema, but that caused another test to fail, which is org.apache.parquet.avro.TestGenericLogicalTypes. So in the end what I decided to do was try using the avro schema, but if that throws an exception, it falls back to conversion and tried again. > Parquet file containing arrays, written by Parquet-MR, cannot be read again > by Parquet-MR > ----------------------------------------------------------------------------------------- > > Key: PARQUET-2069 > URL: https://issues.apache.org/jira/browse/PARQUET-2069 > Project: Parquet > Issue Type: Bug > Components: parquet-avro > Affects Versions: 1.12.0 > Environment: Windows 10 > Reporter: Devon Kozenieski > Priority: Blocker > Attachments: modified.parquet, original.parquet, parquet-diff.png > > > In the attached files, there is one original file, and one written modified > file that results after reading the original file and writing it back with > Parquet-MR, with a few values modified. The schema should not be modified, > since the schema of the input file is used as the schema to write the output > file. However, the output file has a slightly modified schema that then > cannot be read back the same way again with Parquet-MR, resulting in the > exception message: java.lang.ClassCastException: optional binary element > (STRING) is not a group > My guess is that the issue lies in the Avro schema conversion. > The Parquet files attached have some arrays and some nested fields. -- This message was sent by Atlassian Jira (v8.20.7#820007)