GANHONGNAN created PARQUET-2341:
-----------------------------------
Summary: Using column index to filtering null page got
java.lang.ArrayIndexOutOfBoundsException: -1
Key: PARQUET-2341
URL: https://issues.apache.org/jira/browse/PARQUET-2341
Project: Parquet
Issue Type: Bug
Reporter: GANHONGNAN
An empty page index like following
{code:java}
// code placeholder
Boudary order: ASCENDING
null count min max
page-0 2 <none> <none>
{code}
My SQL in SparkSQL like this
{code:java}
// code placeholder
spark.sql("select * from tbl where empty_page_column < 2 or empty_page_column
is null").collect{code}
*Under the condition that both "empty_page_column < 2" and "empty_page_column
is null " are used at the same time and concatenated with 'or' ,
ArrayIndexOutOfBoundsException will be thrown.*
Got following error
{code:java}
// code placeholder
Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
at
org.apache.parquet.internal.column.columnindex.IntColumnIndexBuilder$IntColumnIndex$1.compareValueToMin(IntColumnIndexBuilder.java:74)
at
org.apache.parquet.internal.column.columnindex.BoundaryOrder$2.lt(BoundaryOrder.java:123)
at
org.apache.parquet.internal.column.columnindex.ColumnIndexBuilder$ColumnIndexBase.visit(ColumnIndexBuilder.java:262)
at
org.apache.parquet.internal.column.columnindex.ColumnIndexBuilder$ColumnIndexBase.visit(ColumnIndexBuilder.java:64)
at
org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.lambda$visit$2(ColumnIndexFilter.java:131)
at
org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.applyPredicate(ColumnIndexFilter.java:176)
at
org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.visit(ColumnIndexFilter.java:131)
at
org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.visit(ColumnIndexFilter.java:56)
at
org.apache.parquet.filter2.predicate.Operators$Lt.accept(Operators.java:209)
at
org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.visit(ColumnIndexFilter.java:191)
at
org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.visit(ColumnIndexFilter.java:56)
at
org.apache.parquet.filter2.predicate.Operators$Or.accept(Operators.java:321)
at
org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.visit(ColumnIndexFilter.java:186)
at
org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.visit(ColumnIndexFilter.java:56)
at
org.apache.parquet.filter2.predicate.Operators$And.accept(Operators.java:309)
at
org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter$1.visit(ColumnIndexFilter.java:86)
at
org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter$1.visit(ColumnIndexFilter.java:81)
at
org.apache.parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:137)
at
org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.calculateRowRanges(ColumnIndexFilter.java:81)
at
org.apache.parquet.hadoop.ParquetFileReader.getRowRanges(ParquetFileReader.java:1128)
at
org.apache.parquet.hadoop.ParquetFileReader.getFilteredRecordCount(ParquetFileReader.java:943)
at
org.apache.spark.sql.execution.datasources.parquet.SpecificParquetRecordReaderBase.initializeParquetReader(SpecificParquetRecordReaderBase.java:137)
at
org.apache.spark.sql.execution.datasources.parquet.SpecificParquetRecordReaderBase.initialize(SpecificParquetRecordReaderBase.java:107)
at
org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initialize(VectorizedParquetRecordReader.java:214)
at
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.$anonfun$buildReaderWithPartitionValues$1(ParquetFileFormat.scala:413)
... 25 more {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)