[ https://issues.apache.org/jira/browse/PARQUET-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17684261#comment-17684261 ]
ASF GitHub Bot commented on PARQUET-2237: ----------------------------------------- yabola commented on PR #1023: URL: https://github.com/apache/parquet-mr/pull/1023#issuecomment-1417093967 @wgtmac Thanks for review. I will address your comments and I updated my PR description to explain in more detail. > Improve performance when filters in RowGroupFilter can match exactly > -------------------------------------------------------------------- > > Key: PARQUET-2237 > URL: https://issues.apache.org/jira/browse/PARQUET-2237 > Project: Parquet > Issue Type: Improvement > Reporter: Mars > Priority: Major > > Bloomfilter needs to load from filesystem, it may costs time and space. If we > can exactly determine the existence/nonexistence of the value from other > filters , then we can avoid using Bloomfilter to Improve performance. > > When the minMax values in StatisticsFilter is same, we can exactly determine > the existence/nonexistence of the value. > When we have page dictionaries, we can also determine the > existence/nonexistence of the value. -- This message was sent by Atlassian Jira (v8.20.10#820010)