Hi folks, I'm using v1.11.1 of the parquet-mr library as part of a Java
application that takes Avro records and writes them into Parquet files using
the AvroParquetOutputFormat. There are Avro records with array type fields that
will have null elements, e.g. [ "Foo", "Bar", null, "Baz"]. Here's an example
Avro schema:
{
"type": "record",
"name": "NullLists",
"namespace": "com.test",
"fields": [
{
"name": "KeyID",
"type": "string"
},
{
"name": "NullableList",
"type": [
"null",
{
"type": "array",
"items": [
"null",
"string"
]
}
],
"default": null
}
]
}
I'm trying to write the following record:
{
"KeyID": "0",
"NullableList": [
"foo",
null,
"baz"
]
}
I thought I could use the 3-level list writer to support this, however, it
results in the following exception:
Caused by: java.lang.ClassCastException: repeated binary array (STRING) is not
a group
at org.apache.parquet.schema.Type.asGroupType(Type.java:250)
at
org.apache.parquet.avro.AvroWriteSupport$ThreeLevelListWriter.writeCollection(AvroWriteSupport.java:612)
at
org.apache.parquet.avro.AvroWriteSupport$ListWriter.writeList(AvroWriteSupport.java:397)
at
org.apache.parquet.avro.AvroWriteSupport.writeValueWithoutConversion(AvroWriteSupport.java:355)
at
org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:278)
at
org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:191)
at
org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:165)
at
org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128)
Is this kind of record supported? I have also tried the
"parquet.avro.add-list-element-records" option set to false as well, with no
luck.
____________
Andreas Hailu
Data Lake Engineering | Goldman Sachs & Co.
________________________________
Your Personal Data: We may collect and process information about you that may
be subject to data protection laws. For more information about how we use and
disclose your personal data, how we protect your information, our legal basis
to use your information, your rights and who you can contact, please refer to:
www.gs.com/privacy-notices<http://www.gs.com/privacy-notices>