Soumyakanti Das created HIVE-25893: -------------------------------------- Summary: NPE when reading Parquet data because ColumnVector isNull[] is not updated Key: HIVE-25893 URL: https://issues.apache.org/jira/browse/HIVE-25893 Project: Hive Issue Type: Bug Reporter: Soumyakanti Das Assignee: Soumyakanti Das
In [VectorizedListColumnReader.java|https://github.com/apache/hive/blob/595f3bc9d612f02581bd3377ee0107efd6553ae6/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java] {{isNull[]}} is used in the comparison methods ( eg. [columnVectorsDifferNullForSameIndex |https://github.com/apache/hive/blob/595f3bc9d612f02581bd3377ee0107efd6553ae6/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java#L524] ), however, {{isNull}} is always {{false}} as it is never updated in [getChildData|https://github.com/apache/hive/blob/595f3bc9d612f02581bd3377ee0107efd6553ae6/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java#L401]. This could result in NullPointerException like, {code} Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.compareBytesColumnVector(VectorizedListColumnReader.java:506) at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.compareColumnVector(VectorizedListColumnReader.java:432) at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.setIsRepeating(VectorizedListColumnReader.java:367) at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.convertValueListToListColumnVector(VectorizedListColumnReader.java:360) at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.readBatch(VectorizedListColumnReader.java:83) at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedMapColumnReader.readBatch(VectorizedMapColumnReader.java:57) at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:438) at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:377) at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:100) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:375) {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)