[ https://issues.apache.org/jira/browse/PARQUET-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17684133#comment-17684133 ]
ASF GitHub Bot commented on PARQUET-2237: ----------------------------------------- yabola opened a new pull request, #1023: URL: https://github.com/apache/parquet-mr/pull/1023 Bloomfilter needs to load from filesystem, it may costs time and space. If we can exactly determine the existence/nonexistence of the value from other filters , then we can avoid using Bloomfilter to Improve performance. When the minMax values in StatisticsFilter is same, we can exactly determine the existence/nonexistence of the value. When we have page dictionaries, we can also determine the existence/nonexistence of the value. > Improve performance when filters in RowGroupFilter can match exactly > -------------------------------------------------------------------- > > Key: PARQUET-2237 > URL: https://issues.apache.org/jira/browse/PARQUET-2237 > Project: Parquet > Issue Type: Improvement > Reporter: Mars > Priority: Major > > Bloomfilter needs to load from filesystem, it may costs time and space. If we > can exactly determine the existence/nonexistence of the value from other > filters , then we can avoid using Bloomfilter to Improve performance. > > When the minMax values in StatisticsFilter is same, we can exactly determine > the existence/nonexistence of the value. > When we have page dictionaries, we can also determine the > existence/nonexistence of the value. -- This message was sent by Atlassian Jira (v8.20.10#820010)