[ 
https://issues.apache.org/jira/browse/PARQUET-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17542280#comment-17542280
 ] 

ASF GitHub Bot commented on PARQUET-2069:
-----------------------------------------

islamismailov commented on PR #957:
URL: https://github.com/apache/parquet-mr/pull/957#issuecomment-1137996922

   This is a "list" AND a "map" issue, not just list. If you're using Iceberg, 
good news: just apply this PR to your iceberg branch 
https://github.com/apache/iceberg/pull/3309
   
   Link to the original issue: https://github.com/apache/iceberg/issues/2962
   
   This worked for us. If you still want to fix it in parquet you might be 
interested in this change, or something along those lines (not recommended as I 
didn't fully test this change):
   
       commit 1918276ec7f01279cb9906b9378cb8986f6ad3ea
       Author: Islam Ismailov <[email protected]>
       Date:   Wed May 25 19:03:33 2022 +0000
       
           Attempt a fix on avro-parquet conversion
       
       diff --git 
a/parquet-avro/src/main/java/org/apache/parquet/avro/AvroSchemaConverter.java 
b/parquet-avro/src/main/java/org/apache/parquet/avro/AvroSchemaConverter.java
       index 7d1f3cab..960aae22 100644
       -

> Parquet file containing arrays, written by Parquet-MR, cannot be read again 
> by Parquet-MR
> -----------------------------------------------------------------------------------------
>
>                 Key: PARQUET-2069
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2069
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-avro
>    Affects Versions: 1.12.0
>         Environment: Windows 10
>            Reporter: Devon Kozenieski
>            Priority: Blocker
>         Attachments: modified.parquet, original.parquet, parquet-diff.png
>
>
> In the attached files, there is one original file, and one written modified 
> file that results after reading the original file and writing it back with 
> Parquet-MR, with a few values modified. The schema should not be modified, 
> since the schema of the input file is used as the schema to write the output 
> file. However, the output file has a slightly modified schema that then 
> cannot be read back the same way again with Parquet-MR, resulting in the 
> exception message:  java.lang.ClassCastException: optional binary element 
> (STRING) is not a group
> My guess is that the issue lies in the Avro schema conversion.
> The Parquet files attached have some arrays and some nested fields.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to