clairemcginty commented on code in PR #1328:
URL: https://github.com/apache/parquet-mr/pull/1328#discussion_r1579309150


##########
parquet-column/src/main/java/org/apache/parquet/filter2/predicate/FilterApi.java:
##########
@@ -257,6 +266,16 @@ public static <T extends Comparable<T>, C extends 
Column<T> & SupportsEqNotEq> N
     return new NotIn<>(column, values);
   }
 
+  public static <T extends Comparable<T>, C extends Column<T> & 
SupportsContains> Contains<T> contains(

Review Comment:
   It's not a standard SQL function, but I've seen it in SQL extension 
languages such as [BigQuery Standard 
SQL](https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#in_operators),
 and I've gotten several requests to support this by users of the 
[Scio](https://github.com/spotify/scio) Parquet library!
   
   that's a good point about making this composable, I think it would be more 
efficient to do `CONTAINS(a or b)` than `CONTAINS(a) or CONTAINS(b)`. What do 
you think about supporting `lt/gt` in addition to `eq`-based Contains?  for 
example, `CONTAINS(eq(a) OR gt(b))` ? It would make this PR a lot more complex 
but I'm happy to try. We could probably re-use a lot of the existing filter 
code for `eq`, `lt/gt`, etc...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to