[ https://issues.apache.org/jira/browse/PARQUET-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17687119#comment-17687119 ]
ASF GitHub Bot commented on PARQUET-2237: ----------------------------------------- yabola commented on code in PR #1023: URL: https://github.com/apache/parquet-mr/pull/1023#discussion_r1102881433 ########## parquet-hadoop/src/main/java/org/apache/parquet/filter2/compat/PredicateEvaluation.java: ########## @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.parquet.filter2.compat; + +import org.apache.parquet.filter2.predicate.FilterPredicate; +import org.apache.parquet.filter2.predicate.Operators; + +/** + * Used in Filters to mark whether the block data matches the condition. + * If we cannot decide whether the block matches, it will be always safe to return BLOCK_MIGHT_MATCH. + * + * We use Boolean Object here to distinguish the value type, please do not modify it. + */ +public class PredicateEvaluation { + /* The block might match, but we cannot decide yet, will check in the other filters. */ + public static final Boolean BLOCK_MIGHT_MATCH = new Boolean(false); + /* The block can match for sure. */ + public static final Boolean BLOCK_MUST_MATCH = new Boolean(false); + /* The block can't match for sure */ + public static final Boolean BLOCK_CANNOT_MATCH = new Boolean(true); + + public static Boolean evaluateAnd(Operators.And and, FilterPredicate.Visitor<Boolean> predicate) { + Boolean left = and.getLeft().accept(predicate); Review Comment: Yes, thanks > Improve performance when filters in RowGroupFilter can match exactly > -------------------------------------------------------------------- > > Key: PARQUET-2237 > URL: https://issues.apache.org/jira/browse/PARQUET-2237 > Project: Parquet > Issue Type: Improvement > Reporter: Mars > Priority: Major > > If we can accurately judge by the minMax status, we don’t need to load the > dictionary from filesystem and compare one by one anymore. > Similarly , Bloomfilter needs to load from filesystem, it may costs time and > memory. If we can exactly determine the existence/nonexistence of the value > from minMax or dictionary filters , then we can avoid using Bloomfilter to > Improve performance. > For example, > # read data greater than {{x1}} in the block, if minMax in status is all > greater than {{{}x1{}}}, then we don't need to read dictionary and compare > one by one. > # If we already have page dictionaries and have compared one by one, we > don't need to read BloomFilter and compare. -- This message was sent by Atlassian Jira (v8.20.10#820010)