[ https://issues.apache.org/jira/browse/PARQUET-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17686841#comment-17686841 ]
ASF GitHub Bot commented on PARQUET-2237: ----------------------------------------- wgtmac commented on code in PR #1023: URL: https://github.com/apache/parquet-mr/pull/1023#discussion_r1102202759 ########## parquet-hadoop/src/main/java/org/apache/parquet/filter2/compat/PredicateEvaluation.java: ########## @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.parquet.filter2.compat; + +import org.apache.parquet.filter2.predicate.FilterPredicate; +import org.apache.parquet.filter2.predicate.Operators; + +/** + * Used in Filters to mark whether the block data matches the condition. + * If we cannot decide whether the block matches, it will be always safe to return BLOCK_MIGHT_MATCH. + * + * We use Boolean Object here to distinguish the value type, please do not modify it. + */ +public class PredicateEvaluation { + /* The block might match, but we cannot decide yet, will check in the other filters. */ + public static final Boolean BLOCK_MIGHT_MATCH = new Boolean(false); Review Comment: ```suggestion public static final boolean BLOCK_MIGHT_MATCH = false; ``` ########## parquet-hadoop/src/main/java/org/apache/parquet/filter2/compat/PredicateEvaluation.java: ########## @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.parquet.filter2.compat; + +import org.apache.parquet.filter2.predicate.FilterPredicate; +import org.apache.parquet.filter2.predicate.Operators; + +/** + * Used in Filters to mark whether the block data matches the condition. + * If we cannot decide whether the block matches, it will be always safe to return BLOCK_MIGHT_MATCH. + * + * We use Boolean Object here to distinguish the value type, please do not modify it. + */ +public class PredicateEvaluation { + /* The block might match, but we cannot decide yet, will check in the other filters. */ + public static final Boolean BLOCK_MIGHT_MATCH = new Boolean(false); + /* The block can match for sure. */ + public static final Boolean BLOCK_MUST_MATCH = new Boolean(false); + /* The block can't match for sure */ + public static final Boolean BLOCK_CANNOT_MATCH = new Boolean(true); + + public static Boolean evaluateAnd(Operators.And and, FilterPredicate.Visitor<Boolean> predicate) { + Boolean left = and.getLeft().accept(predicate); + Boolean right = and.getRight().accept(predicate); + if (left == BLOCK_CANNOT_MATCH || right == BLOCK_CANNOT_MATCH) { + return BLOCK_CANNOT_MATCH; + } else if (left == BLOCK_MUST_MATCH && right == BLOCK_MUST_MATCH) { + // if left and right operation must need the block, then we must take the block + return BLOCK_MUST_MATCH; + } else { + return BLOCK_MIGHT_MATCH; + } + } + + public static Boolean evaluateOr(Operators.Or or, FilterPredicate.Visitor<Boolean> predicate) { + Boolean left = or.getLeft().accept(predicate); + Boolean right = or.getRight().accept(predicate); Review Comment: ditto ########## parquet-hadoop/src/main/java/org/apache/parquet/filter2/compat/RowGroupFilter.java: ########## @@ -97,21 +100,37 @@ public List<BlockMetaData> visit(FilterCompat.FilterPredicateCompat filterPredic List<BlockMetaData> filteredBlocks = new ArrayList<BlockMetaData>(); for (BlockMetaData block : blocks) { - boolean drop = false; + Boolean predicate = BLOCK_MIGHT_MATCH; + + if (levels.contains(FilterLevel.STATISTICS)) { + + predicate = StatisticsFilter.evaluate(filterPredicate, block.getColumns()); - if(levels.contains(FilterLevel.STATISTICS)) { - drop = StatisticsFilter.canDrop(filterPredicate, block.getColumns()); + if(predicate == BLOCK_MUST_MATCH) { Review Comment: ```suggestion if (predicate == BLOCK_MUST_MATCH) { ``` ########## parquet-hadoop/src/main/java/org/apache/parquet/filter2/statisticslevel/StatisticsFilter.java: ########## @@ -289,8 +320,14 @@ public <T extends Comparable<T>> Boolean visit(Lt<T> lt) { T value = lt.getValue(); - // drop if value <= min - return stats.compareMinToValue(value) >= 0; + // we are looking for records where v < someValue + if (stats.compareMinToValue(value) >= 0) { + // drop if value <= min + return BLOCK_CANNOT_MATCH; + } else { + // if value > min, we must take it + return BLOCK_MUST_MATCH; Review Comment: Sometimes the min/max values are actually lower/upper bounds. Does this optimization still work for that case? ########## parquet-hadoop/src/main/java/org/apache/parquet/filter2/dictionarylevel/DictionaryFilter.java: ########## @@ -392,15 +420,15 @@ public <T extends Comparable<T>> Boolean visit(In<T> in) { try { Set<T> dictSet = expandDictionary(meta); if (dictSet != null) { - return drop(dictSet, values); + return predicate(dictSet, values); } } catch (IOException e) { LOG.warn("Failed to process dictionary for filter evaluation.", e); } return BLOCK_MIGHT_MATCH; // cannot drop the row group based on this dictionary } - private <T extends Comparable<T>> Boolean drop(Set<T> dictSet, Set<T> values) { + private <T extends Comparable<T>> Boolean predicate(Set<T> dictSet, Set<T> values) { Review Comment: Use a verb instead of noun. ########## parquet-hadoop/src/main/java/org/apache/parquet/filter2/compat/PredicateEvaluation.java: ########## @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.parquet.filter2.compat; + +import org.apache.parquet.filter2.predicate.FilterPredicate; +import org.apache.parquet.filter2.predicate.Operators; + +/** + * Used in Filters to mark whether the block data matches the condition. Review Comment: We'd better explain explicitly that the evaluation decision it whether to drop the row group. That's why BLOCK_MIGHT_MATCH is false and the AND expression uses `||` in the implementation. This is counter-intuitive at the first glance. ########## parquet-hadoop/src/main/java/org/apache/parquet/filter2/bloomfilterlevel/BloomFilterImpl.java: ########## @@ -28,26 +33,29 @@ import org.slf4j.LoggerFactory; import org.apache.parquet.column.values.bloomfilter.BloomFilter; +import org.apache.parquet.filter2.compat.PredicateEvaluation; import org.apache.parquet.filter2.predicate.FilterPredicate; import org.apache.parquet.filter2.predicate.Operators; import org.apache.parquet.filter2.predicate.UserDefinedPredicate; import org.apache.parquet.hadoop.BloomFilterReader; import org.apache.parquet.hadoop.metadata.ColumnChunkMetaData; import org.apache.parquet.hadoop.metadata.ColumnPath; -import static org.apache.parquet.Preconditions.checkNotNull; - public class BloomFilterImpl implements FilterPredicate.Visitor<Boolean>{ private static final Logger LOG = LoggerFactory.getLogger(BloomFilterImpl.class); - private static final boolean BLOCK_MIGHT_MATCH = false; - private static final boolean BLOCK_CANNOT_MATCH = true; private final Map<ColumnPath, ColumnChunkMetaData> columns = new HashMap<ColumnPath, ColumnChunkMetaData>(); - public static boolean canDrop(FilterPredicate pred, List<ColumnChunkMetaData> columns, BloomFilterReader bloomFilterReader) { + public static Boolean evaluate(FilterPredicate pred, List<ColumnChunkMetaData> columns, BloomFilterReader bloomFilterReader) { Review Comment: Why use `Boolean` everywhere? It seems an overkill to me. ########## parquet-hadoop/src/main/java/org/apache/parquet/filter2/compat/PredicateEvaluation.java: ########## @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.parquet.filter2.compat; + +import org.apache.parquet.filter2.predicate.FilterPredicate; +import org.apache.parquet.filter2.predicate.Operators; + +/** + * Used in Filters to mark whether the block data matches the condition. + * If we cannot decide whether the block matches, it will be always safe to return BLOCK_MIGHT_MATCH. + * + * We use Boolean Object here to distinguish the value type, please do not modify it. + */ +public class PredicateEvaluation { + /* The block might match, but we cannot decide yet, will check in the other filters. */ + public static final Boolean BLOCK_MIGHT_MATCH = new Boolean(false); + /* The block can match for sure. */ + public static final Boolean BLOCK_MUST_MATCH = new Boolean(false); + /* The block can't match for sure */ + public static final Boolean BLOCK_CANNOT_MATCH = new Boolean(true); + + public static Boolean evaluateAnd(Operators.And and, FilterPredicate.Visitor<Boolean> predicate) { + Boolean left = and.getLeft().accept(predicate); Review Comment: In the current implementation, left and right predicates are always evaluated. We can short cut the evaluation if `left == BLOCK_CANNOT_MATCH` and skip evaluation of right predicate. It would save some cost if the right expr will read dictionary. ########## parquet-hadoop/src/main/java/org/apache/parquet/filter2/compat/PredicateEvaluation.java: ########## @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.parquet.filter2.compat; + +import org.apache.parquet.filter2.predicate.FilterPredicate; +import org.apache.parquet.filter2.predicate.Operators; + +/** + * Used in Filters to mark whether the block data matches the condition. + * If we cannot decide whether the block matches, it will be always safe to return BLOCK_MIGHT_MATCH. + * + * We use Boolean Object here to distinguish the value type, please do not modify it. + */ +public class PredicateEvaluation { + /* The block might match, but we cannot decide yet, will check in the other filters. */ + public static final Boolean BLOCK_MIGHT_MATCH = new Boolean(false); + /* The block can match for sure. */ + public static final Boolean BLOCK_MUST_MATCH = new Boolean(false); + /* The block can't match for sure */ + public static final Boolean BLOCK_CANNOT_MATCH = new Boolean(true); + + public static Boolean evaluateAnd(Operators.And and, FilterPredicate.Visitor<Boolean> predicate) { + Boolean left = and.getLeft().accept(predicate); + Boolean right = and.getRight().accept(predicate); + if (left == BLOCK_CANNOT_MATCH || right == BLOCK_CANNOT_MATCH) { + return BLOCK_CANNOT_MATCH; + } else if (left == BLOCK_MUST_MATCH && right == BLOCK_MUST_MATCH) { + // if left and right operation must need the block, then we must take the block + return BLOCK_MUST_MATCH; + } else { + return BLOCK_MIGHT_MATCH; + } + } + + public static Boolean evaluateOr(Operators.Or or, FilterPredicate.Visitor<Boolean> predicate) { + Boolean left = or.getLeft().accept(predicate); + Boolean right = or.getRight().accept(predicate); + if (left == BLOCK_CANNOT_MATCH && right == BLOCK_CANNOT_MATCH) { + return BLOCK_CANNOT_MATCH; + } else if (left == BLOCK_MUST_MATCH || right == BLOCK_MUST_MATCH) { + // if left or right operation must need the block, then we must take the block + return BLOCK_MUST_MATCH; + } else { + return BLOCK_MIGHT_MATCH; + } + } + + public static Boolean isDeterminedPredicate(Boolean predicate) { Review Comment: Please add some comments to the public methods. ########## parquet-hadoop/src/main/java/org/apache/parquet/filter2/compat/PredicateEvaluation.java: ########## @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.parquet.filter2.compat; + +import org.apache.parquet.filter2.predicate.FilterPredicate; +import org.apache.parquet.filter2.predicate.Operators; + +/** + * Used in Filters to mark whether the block data matches the condition. + * If we cannot decide whether the block matches, it will be always safe to return BLOCK_MIGHT_MATCH. + * + * We use Boolean Object here to distinguish the value type, please do not modify it. + */ +public class PredicateEvaluation { + /* The block might match, but we cannot decide yet, will check in the other filters. */ + public static final Boolean BLOCK_MIGHT_MATCH = new Boolean(false); + /* The block can match for sure. */ + public static final Boolean BLOCK_MUST_MATCH = new Boolean(false); + /* The block can't match for sure */ + public static final Boolean BLOCK_CANNOT_MATCH = new Boolean(true); + + public static Boolean evaluateAnd(Operators.And and, FilterPredicate.Visitor<Boolean> predicate) { + Boolean left = and.getLeft().accept(predicate); + Boolean right = and.getRight().accept(predicate); + if (left == BLOCK_CANNOT_MATCH || right == BLOCK_CANNOT_MATCH) { + return BLOCK_CANNOT_MATCH; + } else if (left == BLOCK_MUST_MATCH && right == BLOCK_MUST_MATCH) { + // if left and right operation must need the block, then we must take the block + return BLOCK_MUST_MATCH; + } else { + return BLOCK_MIGHT_MATCH; + } + } + + public static Boolean evaluateOr(Operators.Or or, FilterPredicate.Visitor<Boolean> predicate) { + Boolean left = or.getLeft().accept(predicate); + Boolean right = or.getRight().accept(predicate); + if (left == BLOCK_CANNOT_MATCH && right == BLOCK_CANNOT_MATCH) { + return BLOCK_CANNOT_MATCH; + } else if (left == BLOCK_MUST_MATCH || right == BLOCK_MUST_MATCH) { + // if left or right operation must need the block, then we must take the block + return BLOCK_MUST_MATCH; + } else { + return BLOCK_MIGHT_MATCH; + } + } + + public static Boolean isDeterminedPredicate(Boolean predicate) { + return predicate == BLOCK_MUST_MATCH || predicate == BLOCK_CANNOT_MATCH; + } + + public static void checkPredicate(Boolean predicate) { Review Comment: ```suggestion public static void checkPredicate(boolean predicate) { ``` ########## parquet-hadoop/src/main/java/org/apache/parquet/filter2/compat/RowGroupFilter.java: ########## @@ -97,21 +100,37 @@ public List<BlockMetaData> visit(FilterCompat.FilterPredicateCompat filterPredic List<BlockMetaData> filteredBlocks = new ArrayList<BlockMetaData>(); for (BlockMetaData block : blocks) { - boolean drop = false; + Boolean predicate = BLOCK_MIGHT_MATCH; + + if (levels.contains(FilterLevel.STATISTICS)) { + Review Comment: Please remove this blank line. ########## parquet-hadoop/src/main/java/org/apache/parquet/filter2/dictionarylevel/DictionaryFilter.java: ########## @@ -392,15 +420,15 @@ public <T extends Comparable<T>> Boolean visit(In<T> in) { try { Set<T> dictSet = expandDictionary(meta); if (dictSet != null) { - return drop(dictSet, values); + return predicate(dictSet, values); } } catch (IOException e) { LOG.warn("Failed to process dictionary for filter evaluation.", e); } return BLOCK_MIGHT_MATCH; // cannot drop the row group based on this dictionary } - private <T extends Comparable<T>> Boolean drop(Set<T> dictSet, Set<T> values) { + private <T extends Comparable<T>> Boolean predicate(Set<T> dictSet, Set<T> values) { Review Comment: ```suggestion private <T extends Comparable<T>> Boolean evaluate(Set<T> dictSet, Set<T> values) { ``` ########## parquet-hadoop/src/main/java/org/apache/parquet/filter2/compat/RowGroupFilter.java: ########## @@ -97,21 +100,37 @@ public List<BlockMetaData> visit(FilterCompat.FilterPredicateCompat filterPredic List<BlockMetaData> filteredBlocks = new ArrayList<BlockMetaData>(); for (BlockMetaData block : blocks) { - boolean drop = false; + Boolean predicate = BLOCK_MIGHT_MATCH; + + if (levels.contains(FilterLevel.STATISTICS)) { + + predicate = StatisticsFilter.evaluate(filterPredicate, block.getColumns()); - if(levels.contains(FilterLevel.STATISTICS)) { - drop = StatisticsFilter.canDrop(filterPredicate, block.getColumns()); + if(predicate == BLOCK_MUST_MATCH) { + filteredBlocks.add(block); + } + if(isDeterminedPredicate(predicate)) { + continue; + } } - if(!drop && levels.contains(FilterLevel.DICTIONARY)) { - drop = DictionaryFilter.canDrop(filterPredicate, block.getColumns(), reader.getDictionaryReader(block)); + if (levels.contains(FilterLevel.DICTIONARY)) { + Review Comment: ditto ########## parquet-hadoop/src/main/java/org/apache/parquet/filter2/compat/PredicateEvaluation.java: ########## @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.parquet.filter2.compat; + +import org.apache.parquet.filter2.predicate.FilterPredicate; +import org.apache.parquet.filter2.predicate.Operators; + +/** + * Used in Filters to mark whether the block data matches the condition. + * If we cannot decide whether the block matches, it will be always safe to return BLOCK_MIGHT_MATCH. + * + * We use Boolean Object here to distinguish the value type, please do not modify it. + */ +public class PredicateEvaluation { + /* The block might match, but we cannot decide yet, will check in the other filters. */ + public static final Boolean BLOCK_MIGHT_MATCH = new Boolean(false); Review Comment: It will deal with boxing and unboxing automatically. ########## parquet-hadoop/src/main/java/org/apache/parquet/filter2/compat/RowGroupFilter.java: ########## @@ -97,21 +100,37 @@ public List<BlockMetaData> visit(FilterCompat.FilterPredicateCompat filterPredic List<BlockMetaData> filteredBlocks = new ArrayList<BlockMetaData>(); for (BlockMetaData block : blocks) { - boolean drop = false; + Boolean predicate = BLOCK_MIGHT_MATCH; + + if (levels.contains(FilterLevel.STATISTICS)) { + + predicate = StatisticsFilter.evaluate(filterPredicate, block.getColumns()); - if(levels.contains(FilterLevel.STATISTICS)) { - drop = StatisticsFilter.canDrop(filterPredicate, block.getColumns()); + if(predicate == BLOCK_MUST_MATCH) { Review Comment: Same comment for style of lines below. ########## parquet-hadoop/src/main/java/org/apache/parquet/filter2/statisticslevel/StatisticsFilter.java: ########## @@ -129,8 +139,9 @@ public <T extends Comparable<T>> Boolean visit(Eq<T> eq) { return BLOCK_MIGHT_MATCH; } // we are looking for records where v eq(null) - // so drop if there are no nulls in this chunk - return !hasNulls(meta); + // so if there are no nulls in this chunk, we can drop it, + // if there has nulls in this chunk, we must take it + return !hasNulls(meta) ? BLOCK_CANNOT_MATCH : BLOCK_MUST_MATCH; Review Comment: I suspect that hasNull will always return precise information. When null_count is missing, hasNulls also returns true. Replacing `BLOCK_MUST_MATCH` with `BLOCK_MIGHT_MATCH` here makes more sense to me. ########## parquet-hadoop/src/main/java/org/apache/parquet/filter2/compat/PredicateEvaluation.java: ########## @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.parquet.filter2.compat; + +import org.apache.parquet.filter2.predicate.FilterPredicate; +import org.apache.parquet.filter2.predicate.Operators; + +/** + * Used in Filters to mark whether the block data matches the condition. + * If we cannot decide whether the block matches, it will be always safe to return BLOCK_MIGHT_MATCH. + * + * We use Boolean Object here to distinguish the value type, please do not modify it. + */ +public class PredicateEvaluation { + /* The block might match, but we cannot decide yet, will check in the other filters. */ + public static final Boolean BLOCK_MIGHT_MATCH = new Boolean(false); + /* The block can match for sure. */ + public static final Boolean BLOCK_MUST_MATCH = new Boolean(false); + /* The block can't match for sure */ + public static final Boolean BLOCK_CANNOT_MATCH = new Boolean(true); + + public static Boolean evaluateAnd(Operators.And and, FilterPredicate.Visitor<Boolean> predicate) { + Boolean left = and.getLeft().accept(predicate); + Boolean right = and.getRight().accept(predicate); + if (left == BLOCK_CANNOT_MATCH || right == BLOCK_CANNOT_MATCH) { + return BLOCK_CANNOT_MATCH; + } else if (left == BLOCK_MUST_MATCH && right == BLOCK_MUST_MATCH) { + // if left and right operation must need the block, then we must take the block + return BLOCK_MUST_MATCH; + } else { + return BLOCK_MIGHT_MATCH; + } + } + + public static Boolean evaluateOr(Operators.Or or, FilterPredicate.Visitor<Boolean> predicate) { + Boolean left = or.getLeft().accept(predicate); + Boolean right = or.getRight().accept(predicate); + if (left == BLOCK_CANNOT_MATCH && right == BLOCK_CANNOT_MATCH) { + return BLOCK_CANNOT_MATCH; + } else if (left == BLOCK_MUST_MATCH || right == BLOCK_MUST_MATCH) { + // if left or right operation must need the block, then we must take the block + return BLOCK_MUST_MATCH; + } else { + return BLOCK_MIGHT_MATCH; + } + } + + public static Boolean isDeterminedPredicate(Boolean predicate) { + return predicate == BLOCK_MUST_MATCH || predicate == BLOCK_CANNOT_MATCH; + } + + public static void checkPredicate(Boolean predicate) { Review Comment: If `BLOCK_CANNOT_MATCH` is defined as `Boolean` but not as `boolean`, the equality check here may fail if the input predicate is created in other places. Either replace `==` with `.equals()` or use `boolean` as your choice. > Improve performance when filters in RowGroupFilter can match exactly > -------------------------------------------------------------------- > > Key: PARQUET-2237 > URL: https://issues.apache.org/jira/browse/PARQUET-2237 > Project: Parquet > Issue Type: Improvement > Reporter: Mars > Priority: Major > > If we can accurately judge by the minMax status, we don’t need to load the > dictionary from filesystem and compare one by one anymore. > Similarly , Bloomfilter needs to load from filesystem, it may costs time and > memory. If we can exactly determine the existence/nonexistence of the value > from minMax or dictionary filters , then we can avoid using Bloomfilter to > Improve performance. > For example, > # read data greater than {{x1}} in the block, if minMax in status is all > greater than {{{}x1{}}}, then we don't need to read dictionary and compare > one by one. > # If we already have page dictionaries and have compared one by one, we > don't need to read BloomFilter and compare. -- This message was sent by Atlassian Jira (v8.20.10#820010)