[
https://issues.apache.org/jira/browse/PARQUET-387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15018972#comment-15018972
]
Ryan Blue commented on PARQUET-387:
-----------------------------------
As Cheng notes, setting the configuration property to use the new, 3-level list
structure fixes the problem. Support for arrays that contain null was the point
of moving to the 3-level structure. Unfortunately, we couldn't do that by
default because it is a behavior change. That's why we have the property.
As far as what the correct behavior is for this case, I think the NPE is
actually right. Because the 2-level structure can't handle null values, we have
the choice of either silently dropping them or throwing an exception. Silently
dropping them clearly isn't a good idea, so the NPE is the best option. What we
could do to make this more friendly is catch the NPE and throw a new one with
an error message to tell you what happened: that null values aren't allowed in
Avro arrays unless you write using the property.
> TwoLevelListWriter does not handle null values in array
> -------------------------------------------------------
>
> Key: PARQUET-387
> URL: https://issues.apache.org/jira/browse/PARQUET-387
> Project: Parquet
> Issue Type: Bug
> Reporter: Taras Bobrovytsky
>
> parquet-mr is unable to handle the following avro schema:
> {code}
> {"type": "record",
> "namespace": "com.cloudera.impala",
> "name": "table_3",
> "fields": [
> {"name": "field_6", "type":
> {"type": "array", "items": ["null",
> {"type": "map", "values": ["null", "string"]}]}}]}
> {code}
> If map is null, the following exception happens:
> {code}
> java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:293)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
> at parquet.avro.AvroWriteSupport.writeMap(AvroWriteSupport.java:185)
> at parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:277)
> at parquet.avro.AvroWriteSupport.access$400(AvroWriteSupport.java:48)
> at
> parquet.avro.AvroWriteSupport$TwoLevelListWriter.writeCollection(AvroWriteSupport.java:473)
> at
> parquet.avro.AvroWriteSupport$ListWriter.writeList(AvroWriteSupport.java:322)
> at parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:275)
> at
> parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:169)
> at parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:144)
> at
> parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:116)
> at parquet.hadoop.ParquetWriter.write(ParquetWriter.java:324)
> at
> com.cloudera.impala.datagenerator.RandomNestedDataGenerator.writeFile(RandomNestedDataGenerator.java:69)
> at
> com.cloudera.impala.datagenerator.RandomNestedDataGenerator.main(RandomNestedDataGenerator.java:284)
> {code}
> The cause is probably because if there is a null value in the array, the
> TwoLevelListWriter does not check if an element is null:
> https://github.com/apache/parquet-mr/blob/master/parquet-avro/src/main/java/org/apache/parquet/avro/AvroWriteSupport.java#L456
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)