[ 
https://issues.apache.org/jira/browse/PARQUET-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183312#comment-17183312
 ] 

Gabor Szadovszky commented on PARQUET-1901:
-------------------------------------------

It is clear we shall handle this case properly. I've quickly checked the other 
filters ({{DictionaryFilter}}, {{StatisticsFilter}} and {{BloomFilterImpl}}) 
and neither handles the case of the filter being {{null}} (meaning they all 
throw NPE). So, I would vote on not checking for the filter being {{null}} in 
{{ColumnIndexFilter}}. Instead, the places where it is invoked shall handle the 
case of a {{null}} filter like 
[here|https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java#L870-L872].

> Add filter null check for ColumnIndex  
> ---------------------------------------
>
>                 Key: PARQUET-1901
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1901
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>    Affects Versions: 1.11.0
>            Reporter: Xinli Shang
>            Assignee: Xinli Shang
>            Priority: Major
>             Fix For: 1.12.0
>
>
> This Jira is opened for discussion that should we add null checking for the 
> filter when ColumnIndex is enabled. 
> In the ColumnIndexFilter#calculateRowRanges() method, the input parameter 
> 'filter' is assumed to be non-null without checking. It throws NPE when 
> ColumnIndex is enabled(by default) but there is no filter set in the 
> ParquetReadOptions. The call stack is as below. 
>     java.lang.NullPointerException
>         at 
> org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.calculateRowRanges(ColumnIndexFilter.java:81)
>         at 
> org.apache.parquet.hadoop.ParquetFileReader.getRowRanges(ParquetFileReader.java:961)
>         at 
> org.apache.parquet.hadoop.ParquetFileReader.readNextFilteredRowGroup(ParquetFileReader.java:891)
> If we don't add, the user might need to choose to call readNextRowGroup() or 
> readFilteredNextRowGroup() accordingly based on filter existence. 
> Thoughts?  
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to