[ 
https://issues.apache.org/jira/browse/PARQUET-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758555#comment-17758555
 ] 

GANHONGNAN commented on PARQUET-2341:
-------------------------------------

This issue has been solved by PARQUET-1744: Some filters throws 
ArrayIndexOutOfBoundsException (#732)

 

> Using column index to filtering null page got 
> java.lang.ArrayIndexOutOfBoundsException: -1
> ------------------------------------------------------------------------------------------
>
>                 Key: PARQUET-2341
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2341
>             Project: Parquet
>          Issue Type: Bug
>            Reporter: GANHONGNAN
>            Priority: Blocker
>
> An empty page index like following
>  
> {code:java}
> // code placeholder
> Boudary order: ASCENDING
>                       null count  min                           max           
>                           
> page-0                         2  <none>                      <none>          
>                          {code}
>  
> My SQL in SparkSQL like this
>  
> {code:java}
> // code placeholder
> spark.sql("select * from tbl where empty_page_column < 2 or empty_page_column 
> is null").collect{code}
>  
>  
> *Under the condition that both "empty_page_column < 2"  and 
> "empty_page_column is null "  are used at the same time and concatenated with 
> 'or' ,  ArrayIndexOutOfBoundsException will be thrown.*
>  
> Got following error
>  
> {code:java}
> // code placeholder
> Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
>       at 
> org.apache.parquet.internal.column.columnindex.IntColumnIndexBuilder$IntColumnIndex$1.compareValueToMin(IntColumnIndexBuilder.java:74)
>       at 
> org.apache.parquet.internal.column.columnindex.BoundaryOrder$2.lt(BoundaryOrder.java:123)
>       at 
> org.apache.parquet.internal.column.columnindex.ColumnIndexBuilder$ColumnIndexBase.visit(ColumnIndexBuilder.java:262)
>       at 
> org.apache.parquet.internal.column.columnindex.ColumnIndexBuilder$ColumnIndexBase.visit(ColumnIndexBuilder.java:64)
>       at 
> org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.lambda$visit$2(ColumnIndexFilter.java:131)
>       at 
> org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.applyPredicate(ColumnIndexFilter.java:176)
>       at 
> org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.visit(ColumnIndexFilter.java:131)
>       at 
> org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.visit(ColumnIndexFilter.java:56)
>       at 
> org.apache.parquet.filter2.predicate.Operators$Lt.accept(Operators.java:209)
>       at 
> org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.visit(ColumnIndexFilter.java:191)
>       at 
> org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.visit(ColumnIndexFilter.java:56)
>       at 
> org.apache.parquet.filter2.predicate.Operators$Or.accept(Operators.java:321)
>       at 
> org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.visit(ColumnIndexFilter.java:186)
>       at 
> org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.visit(ColumnIndexFilter.java:56)
>       at 
> org.apache.parquet.filter2.predicate.Operators$And.accept(Operators.java:309)
>       at 
> org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter$1.visit(ColumnIndexFilter.java:86)
>       at 
> org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter$1.visit(ColumnIndexFilter.java:81)
>       at 
> org.apache.parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:137)
>       at 
> org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.calculateRowRanges(ColumnIndexFilter.java:81)
>       at 
> org.apache.parquet.hadoop.ParquetFileReader.getRowRanges(ParquetFileReader.java:1128)
>       at 
> org.apache.parquet.hadoop.ParquetFileReader.getFilteredRecordCount(ParquetFileReader.java:943)
>       at 
> org.apache.spark.sql.execution.datasources.parquet.SpecificParquetRecordReaderBase.initializeParquetReader(SpecificParquetRecordReaderBase.java:137)
>       at 
> org.apache.spark.sql.execution.datasources.parquet.SpecificParquetRecordReaderBase.initialize(SpecificParquetRecordReaderBase.java:107)
>       at 
> org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initialize(VectorizedParquetRecordReader.java:214)
>       at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.$anonfun$buildReaderWithPartitionValues$1(ParquetFileFormat.scala:413)
>       ... 25 more {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to