wgtmac commented on code in PR #1023:
URL: https://github.com/apache/parquet-mr/pull/1023#discussion_r1096546533


##########
parquet-hadoop/src/main/java/org/apache/parquet/filter2/compat/RowGroupFilter.java:
##########
@@ -98,16 +99,19 @@ public List<BlockMetaData> 
visit(FilterCompat.FilterPredicateCompat filterPredic
 
     for (BlockMetaData block : blocks) {
       boolean drop = false;
+      // Whether one filter can exactly determine the existence/nonexistence 
of the value.
+      // If true then we can skip the remaining filters to save time and space.
+      AtomicBoolean canExactlyDetermine = new AtomicBoolean(false);

Review Comment:
   Why atomic?



##########
parquet-hadoop/src/test/java/org/apache/parquet/filter2/dictionarylevel/DictionaryFilterTest.java:
##########
@@ -792,6 +793,16 @@ public void testInverseUdpMissingColumn() throws Exception 
{
       canDrop(LogicalInverseRewriter.rewrite(not(userDefined(fake, 
nullRejecting))), ccmd, dictionaries));
   }
 
+  @Test
+  public void testCanSkipOtherFilters() {

Review Comment:
   The test looks a little bit insufficient. More kinds of predicates and 
compound predicates need to be covered. Also test of `RowGroupFilter` is 
missing.



##########
parquet-hadoop/src/main/java/org/apache/parquet/filter2/dictionarylevel/DictionaryFilter.java:
##########
@@ -559,4 +591,12 @@ private static boolean 
hasNonDictionaryPages(ColumnChunkMetaData meta) {
       return true;
     }
   }
+
+  private <T extends Comparable<T>> void markCanExactlyDetermine(Set<T> 
dictSet) {
+    if (dictSet == null) {
+      canExactlyDetermine = false;

Review Comment:
   It seems that `canExactlyDetermine` should use `OR` to update its value. 
Otherwise, any predicate with a null dict will set it to false even if previous 
predicates have marked it to true.
   
   Additionally, we may have a chance to shortcut the evaluation as well if any 
predicate has set it to true.



##########
parquet-hadoop/src/main/java/org/apache/parquet/filter2/compat/RowGroupFilter.java:
##########
@@ -98,16 +99,19 @@ public List<BlockMetaData> 
visit(FilterCompat.FilterPredicateCompat filterPredic
 
     for (BlockMetaData block : blocks) {
       boolean drop = false;
+      // Whether one filter can exactly determine the existence/nonexistence 
of the value.
+      // If true then we can skip the remaining filters to save time and space.
+      AtomicBoolean canExactlyDetermine = new AtomicBoolean(false);

Review Comment:
   I'd suggest rename `canExactlyDetermine` to `preciselyDetermined`. Or even 
better, use an enum something like below
   ```java
   enum PredicateEvaluation {
     CAN_DROP,     /* the block can be dropped for sure */
     CANNOT_DROP,  /* the block cannot be dropped for sure*/
     MAY_DROP,     /* cannot decide yet, may be dropped by other filter levels 
*/
   }
   ```
   In this way, we can merge the the two boolean values here. The downside is 
that the code may need more refactoring to add the enum value to different 
filter classes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to