clairemcginty commented on code in PR #1328:
URL: https://github.com/apache/parquet-mr/pull/1328#discussion_r1579309150
##########
parquet-column/src/main/java/org/apache/parquet/filter2/predicate/FilterApi.java:
##########
@@ -257,6 +266,16 @@ public static <T extends Comparable<T>, C extends
Column<T> & SupportsEqNotEq> N
return new NotIn<>(column, values);
}
+ public static <T extends Comparable<T>, C extends Column<T> &
SupportsContains> Contains<T> contains(
Review Comment:
It's not a standard SQL function, but I've seen it in SQL extension
languages such as [BigQuery Standard
SQL](https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#in_operators),
and I've gotten several requests to support this by users of the
[Scio](https://github.com/spotify/scio) Parquet library!
that's a good point about making this composable, I think it would be more
efficient to do `CONTAINS(a or b)` than `CONTAINS(a) or CONTAINS(b)`. What do
you think about supporting `lt/gt` in addition to `eq`-based Contains? for
example, `CONTAINS(eq(a) OR gt(b))` ? It would make this PR a lot more complex
but I'm happy to try. We could probably re-use a lot of the existing filter
code for `eq`, `lt/gt`, etc...
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]