[ 
https://issues.apache.org/jira/browse/PARQUET-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17684261#comment-17684261
 ] 

ASF GitHub Bot commented on PARQUET-2237:
-----------------------------------------

yabola commented on PR #1023:
URL: https://github.com/apache/parquet-mr/pull/1023#issuecomment-1417093967

   @wgtmac Thanks for review. I will address your comments and I updated my PR 
description to explain in more detail.




> Improve performance when filters in RowGroupFilter can match exactly
> --------------------------------------------------------------------
>
>                 Key: PARQUET-2237
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2237
>             Project: Parquet
>          Issue Type: Improvement
>            Reporter: Mars
>            Priority: Major
>
> Bloomfilter needs to load from filesystem, it may costs time and space. If we 
> can  exactly determine the existence/nonexistence of the value from other 
> filters , then we can avoid using Bloomfilter to Improve performance.
>  
> When the minMax values in  StatisticsFilter is same, we can exactly determine 
> the existence/nonexistence of the value.
> When we have page dictionaries, we can also determine the 
> existence/nonexistence of the value.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to