wgtmac commented on code in PR #2941:
URL: https://github.com/apache/parquet-java/pull/2941#discussion_r1667871679
##########
parquet-column/src/main/java/org/apache/parquet/filter2/predicate/Operators.java:
##########
@@ -345,11 +374,14 @@ public <R> R accept(Visitor<R> visitor) {
}
/**
- * Applies a filtering Vistitor to the Contains predicate, traversing any
composed And or Or clauses,
- * and finally delegating to the underlying ColumnFilterPredicate.
+ * Applies a filtering Visitor to the Contains predicate, traversing any
composed And or Or clauses,
+ * and finally delegating to the underlying column predicate.
*/
public abstract <R> R filter(
- Visitor<R> visitor, BiFunction<R, R, R> andBehavior, BiFunction<R, R,
R> orBehavior);
+ Visitor<R> visitor,
+ BiFunction<R, R, R> andBehavior,
+ BiFunction<R, R, R> orBehavior,
+ Function<R, R> notBehavior);
Review Comment:
IIUC, all kinds of filters return all rows if `notBehavior` is found, right?
##########
parquet-column/src/main/java/org/apache/parquet/internal/column/columnindex/ColumnIndexBuilder.java:
##########
@@ -371,7 +371,11 @@ public <T extends Comparable<T>> PrimitiveIterator.OfInt
visit(NotIn<T> notIn) {
@Override
public <T extends Comparable<T>> PrimitiveIterator.OfInt visit(Contains<T>
contains) {
- return contains.filter(this, IndexIterator::intersection,
IndexIterator::union);
+ return contains.filter(
Review Comment:
Yes, we cannot make such subtraction unless it is a single-value page which
has only `bar`.
##########
parquet-column/src/main/java/org/apache/parquet/filter2/recordlevel/IncrementallyUpdatedFilterPredicate.java:
##########
@@ -145,6 +146,83 @@ public boolean accept(Visitor visitor) {
}
}
+ /**
+ * A ValueInspector implementation that keeps state for one or more delegate
inspectors.
+ */
+ abstract static class DelegatingValueInspector extends ValueInspector {
Review Comment:
`IncrementallyUpdatedFilterPredicateGenerator.java` is pretty difficult to
read and review. Since this is only applied to `ContainsPredicate` and does not
affect others, I'm fine to make this change.
##########
parquet-hadoop/src/main/java/org/apache/parquet/filter2/statisticslevel/StatisticsFilter.java:
##########
@@ -214,7 +214,7 @@ public <T extends Comparable<T>> Boolean visit(NotIn<T>
notIn) {
@Override
public <T extends Comparable<T>> Boolean visit(Contains<T> contains) {
- return contains.filter(this, (l, r) -> l || r, (l, r) -> l && r);
+ return contains.filter(this, (l, r) -> l || r, (l, r) -> l && r, b ->
BLOCK_MIGHT_MATCH);
Review Comment:
```suggestion
return contains.filter(this, (l, r) -> l || r, (l, r) -> l && r, v ->
BLOCK_MIGHT_MATCH);
```
Just want to be consistent with `BloomFilterImpl` and `DictionaryFilter`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]