[ https://issues.apache.org/jira/browse/PARQUET-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andreas Hailu updated PARQUET-2051: ----------------------------------- Fix Version/s: 1.12.3 > AvroWriteSupport does not pass Configuration to AvroSchemaConverter on > Creation > ------------------------------------------------------------------------------- > > Key: PARQUET-2051 > URL: https://issues.apache.org/jira/browse/PARQUET-2051 > Project: Parquet > Issue Type: Bug > Reporter: Andreas Hailu > Assignee: Andreas Hailu > Priority: Major > Fix For: 1.12.3 > > > Because of this, we're unable to fully leverage the ThreeLevelListWriter > functionality when trying to write Avro lists out using Parquet through the > AvroParquetOutputFormat. > The following record is used for testing: > Schema: > { "type": "record", "name": "NullLists", "namespace": "com.test", "fields": [ > \{ "name": "KeyID", "type": "string" }, \{ "name": "NullableList", "type": [ > "null", { "type": "array", "items": [ "null", "string" ] } ], "default": null > } ] } > Record (using basic JSON just for display purposes): > { "KeyID": "0", "NullableList": [ "foo", null, "baz" ] } > During testing, we see the following exception: > {quote}{{Caused by: java.lang.ClassCastException: repeated binary array > (STRING) is not a group}} > \{{ at org.apache.parquet.schema.Type.asGroupType(Type.java:250)}} > \{{ at > org.apache.parquet.avro.AvroWriteSupport$ThreeLevelListWriter.writeCollection(AvroWriteSupport.java:612)}} > \{{ at > org.apache.parquet.avro.AvroWriteSupport$ListWriter.writeList(AvroWriteSupport.java:397)}} > \{{ at > org.apache.parquet.avro.AvroWriteSupport.writeValueWithoutConversion(AvroWriteSupport.java:355)}} > \{{ at > org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:278)}} > \{{ at > org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:191)}} > \{{ at > org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:165)}} > \{{ at > org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128}} > {quote} > Upon review, it was found that the configuration option that was set in > AvroWriteSupport for the ThreeLevelListWriter, > parquet.avro.write-old-list-structure being set to false, was never shared > with the AvroSchemaConverter. > Once we made this change and tested locally, we observe the record with nulls > in the array being successfully written by AvroParquetOutputFormat. -- This message was sent by Atlassian Jira (v8.20.7#820007)