I was able to get something working locally. I'll open a JIRA and have a PR once I have sufficient tests in place.
// ah From: Hailu, Andreas [Engineering] Sent: Friday, May 14, 2021 12:09 PM To: [email protected] Subject: AvroParquetOutputFormat - Unable to Write Arrays with Null Elements Hi folks, I'm using v1.11.1 of the parquet-mr library as part of a Java application that takes Avro records and writes them into Parquet files using the AvroParquetOutputFormat. There are Avro records with array type fields that will have null elements, e.g. [ "Foo", "Bar", null, "Baz"]. Here's an example Avro schema: { "type": "record", "name": "NullLists", "namespace": "com.test", "fields": [ { "name": "KeyID", "type": "string" }, { "name": "NullableList", "type": [ "null", { "type": "array", "items": [ "null", "string" ] } ], "default": null } ] } I'm trying to write the following record: { "KeyID": "0", "NullableList": [ "foo", null, "baz" ] } I thought I could use the 3-level list writer to support this, however, it results in the following exception: Caused by: java.lang.ClassCastException: repeated binary array (STRING) is not a group at org.apache.parquet.schema.Type.asGroupType(Type.java:250) at org.apache.parquet.avro.AvroWriteSupport$ThreeLevelListWriter.writeCollection(AvroWriteSupport.java:612) at org.apache.parquet.avro.AvroWriteSupport$ListWriter.writeList(AvroWriteSupport.java:397) at org.apache.parquet.avro.AvroWriteSupport.writeValueWithoutConversion(AvroWriteSupport.java:355) at org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:278) at org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:191) at org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:165) at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128) Is this kind of record supported? I have also tried the "parquet.avro.add-list-element-records" option set to false as well, with no luck. ____________ Andreas Hailu Data Lake Engineering | Goldman Sachs & Co. ________________________________ Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices<http://www.gs.com/privacy-notices>
