[ https://issues.apache.org/jira/browse/PARQUET-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17684266#comment-17684266 ]
ASF GitHub Bot commented on PARQUET-2237: ----------------------------------------- yabola commented on code in PR #1023: URL: https://github.com/apache/parquet-mr/pull/1023#discussion_r1096650696 ########## parquet-hadoop/src/main/java/org/apache/parquet/filter2/compat/RowGroupFilter.java: ########## @@ -98,16 +99,19 @@ public List<BlockMetaData> visit(FilterCompat.FilterPredicateCompat filterPredic for (BlockMetaData block : blocks) { boolean drop = false; + // Whether one filter can exactly determine the existence/nonexistence of the value. + // If true then we can skip the remaining filters to save time and space. + AtomicBoolean canExactlyDetermine = new AtomicBoolean(false); Review Comment: It used to be for the convenience of fetching the returned results. But I will change my codes in another implemention later > Improve performance when filters in RowGroupFilter can match exactly > -------------------------------------------------------------------- > > Key: PARQUET-2237 > URL: https://issues.apache.org/jira/browse/PARQUET-2237 > Project: Parquet > Issue Type: Improvement > Reporter: Mars > Priority: Major > > If we can accurately judge by the minMax status, we don’t need to load the > dictionary from filesystem and compare one by one anymore. > Similarly , Bloomfilter needs to load from filesystem, it may costs time and > memory. If we can exactly determine the existence/nonexistence of the value > from minMax or dictionary filters , then we can avoid using Bloomfilter to > Improve performance. > For example, > # read data greater than {{x1}} in the block, if minMax in status is all > greater than {{{}x1{}}}, then we don't need to read dictionary and compare > one by one. > # If we already have page dictionaries and have compared one by one, we > don't need to read BloomFilter and compare. -- This message was sent by Atlassian Jira (v8.20.10#820010)