clairemcginty commented on code in PR #1328:
URL: https://github.com/apache/parquet-mr/pull/1328#discussion_r1581166004
##########
parquet-column/src/main/java/org/apache/parquet/filter2/predicate/FilterApi.java:
##########
@@ -257,6 +266,16 @@ public static <T extends Comparable<T>, C extends
Column<T> & SupportsEqNotEq> N
return new NotIn<>(column, values);
}
+ public static <T extends Comparable<T>, C extends Column<T> &
SupportsContains> Contains<T> contains(
Review Comment:
Actually, I think I need some clarification on the expected behavior of
And/Or for an array field...
Say we have the following records, each containing a repeated int field with
the following values:
```
Record1:
repeated_int_field: [2, 3, 4, 5]
Record2:
repeated_int_field: [1, 7]
```
Given the predicate `contains(and(gt(3), lt(6)))`, should we return:
1. Only Record1, because it contains **at least one element that satisfies
both predicates** gt(3) and lt(6), or
2. Both Record1 and Record2, because Record2 contains **elements that
satisfy each predicate separately** (7 matches `gt(3)` but not `lt(6)`, and 1
satisfies `lt(6)` but not `gt(3)`?
I think I would assume 2 🤔 but not totally sure.
cc @gszadovszky as well :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]