Taras Bobrovytsky created PARQUET-387:
-----------------------------------------

             Summary: TwoLevelListWriter does not handle null values in array
                 Key: PARQUET-387
                 URL: https://issues.apache.org/jira/browse/PARQUET-387
             Project: Parquet
          Issue Type: Bug
            Reporter: Taras Bobrovytsky


parquet-mr is unable to handle the following avro schema:
{code}
{"type": "record",
 "namespace": "com.cloudera.impala",
 "name": "table_3",
 "fields": [
   {"name": "field_6", "type":
     {"type": "array", "items": ["null",
       {"type": "map", "values": ["null", "string"]}]}}]}
{code}

If map is null, the following exception happens:
{code}
java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:293)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
        at parquet.avro.AvroWriteSupport.writeMap(AvroWriteSupport.java:185)
        at parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:277)
        at parquet.avro.AvroWriteSupport.access$400(AvroWriteSupport.java:48)
        at 
parquet.avro.AvroWriteSupport$TwoLevelListWriter.writeCollection(AvroWriteSupport.java:473)
        at 
parquet.avro.AvroWriteSupport$ListWriter.writeList(AvroWriteSupport.java:322)
        at parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:275)
        at 
parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:169)
        at parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:144)
        at 
parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:116)
        at parquet.hadoop.ParquetWriter.write(ParquetWriter.java:324)
        at 
com.cloudera.impala.datagenerator.RandomNestedDataGenerator.writeFile(RandomNestedDataGenerator.java:69)
        at 
com.cloudera.impala.datagenerator.RandomNestedDataGenerator.main(RandomNestedDataGenerator.java:284)
{code}

The cause is probably because if there is a null value in the array, the 
TwoLevelListWriter does not check if an element is null: 
https://github.com/apache/parquet-mr/blob/master/parquet-avro/src/main/java/org/apache/parquet/avro/AvroWriteSupport.java#L456



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to