[ https://issues.apache.org/jira/browse/PARQUET-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17556406#comment-17556406 ]
ASF GitHub Bot commented on PARQUET-2069: ----------------------------------------- theosib-amazon commented on code in PR #957: URL: https://github.com/apache/parquet-mr/pull/957#discussion_r901740673 ########## parquet-avro/src/test/java/org/apache/parquet/avro/TestArrayListCompatibility.java: ########## @@ -0,0 +1,51 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.parquet.avro; + +import com.google.common.io.Resources; +import org.apache.avro.generic.GenericData; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.Path; +import org.apache.parquet.hadoop.ParquetReader; +import org.junit.Test; +import java.io.IOException; + +public class TestArrayListCompatibility { + + @Test + public void testListArrayCompatibility() throws IOException { + Path testPath = new Path(Resources.getResource("list-array-compat.parquet").getFile()); + + Configuration conf = new Configuration(); + ParquetReader<Object> parquetReader = + AvroParquetReader.builder(testPath).withConf(conf).build(); + GenericData.Record firstRecord; + try { + firstRecord = (GenericData.Record) parquetReader.read(); + } catch (Exception x) { + x.printStackTrace(); Review Comment: Ok, I got rid of the extra catch. I'm not sure what kind of exceptions parquetReader.read() can throw, though, so we'll see if we get a compile error from not specifying it in the function signature. :) > Parquet file containing arrays, written by Parquet-MR, cannot be read again > by Parquet-MR > ----------------------------------------------------------------------------------------- > > Key: PARQUET-2069 > URL: https://issues.apache.org/jira/browse/PARQUET-2069 > Project: Parquet > Issue Type: Bug > Components: parquet-avro > Affects Versions: 1.12.0 > Environment: Windows 10 > Reporter: Devon Kozenieski > Priority: Blocker > Attachments: modified.parquet, original.parquet, parquet-diff.png > > > In the attached files, there is one original file, and one written modified > file that results after reading the original file and writing it back with > Parquet-MR, with a few values modified. The schema should not be modified, > since the schema of the input file is used as the schema to write the output > file. However, the output file has a slightly modified schema that then > cannot be read back the same way again with Parquet-MR, resulting in the > exception message: java.lang.ClassCastException: optional binary element > (STRING) is not a group > My guess is that the issue lies in the Avro schema conversion. > The Parquet files attached have some arrays and some nested fields. -- This message was sent by Atlassian Jira (v8.20.7#820007)