aokolnychyi commented on a change in pull request #3613:
URL: https://github.com/apache/iceberg/pull/3613#discussion_r757705716
##########
File path:
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/SparkFilters.java
##########
@@ -168,9 +169,23 @@ public static Expression convert(Filter filter) {
case NOT:
Not notFilter = (Not) filter;
- Expression child = convert(notFilter.child());
- if (child != null) {
- return not(child);
+ Filter childFilter = notFilter.child();
+ Operation childOp = FILTERS.get(childFilter.getClass());
+ if (childOp == Operation.IN) {
+ // infer an extra notNull predicate for Spark NOT IN filters
+ // as Iceberg expressions don't follow the 3-value SQL boolean
logic
+ // col NOT IN (1, 2) in Spark is equivalent to notNull(col) &&
notIn(col, 1, 2) in Iceberg
+ In childInFilter = (In) childFilter;
+ Expression notIn = notIn(unquote(childInFilter.attribute()),
+ Stream.of(childInFilter.values())
+ .map(SparkFilters::convertLiteral)
+ .collect(Collectors.toList()));
+ return and(notNull(childInFilter.attribute()), notIn);
+ } else if (hasNoInFilter(childFilter)) {
Review comment:
It is to prevent translation of nested NOT IN predicates inside NOT. I
think the optimizer will push `NOT` into the expression in most cases but I
guess it is better to check explicitly.
```
NOT (col1 > 10 AND col2 NOT IN (1, 2))
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]