[GitHub] [iceberg] aokolnychyi commented on a change in pull request #3613: Spark: Fix NOT IN predicates in SparkFilters

GitBox Fri, 26 Nov 2021 14:15:38 -0800


aokolnychyi commented on a change in pull request #3613:
URL: https://github.com/apache/iceberg/pull/3613#discussion_r757705716




##########
File path: 
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/SparkFilters.java
##########
@@ -168,9 +169,23 @@ public static Expression convert(Filter filter) {
 
         case NOT:
           Not notFilter = (Not) filter;
-          Expression child = convert(notFilter.child());
-          if (child != null) {
-            return not(child);
+          Filter childFilter = notFilter.child();
+          Operation childOp = FILTERS.get(childFilter.getClass());
+          if (childOp == Operation.IN) {
+            // infer an extra notNull predicate for Spark NOT IN filters
+            // as Iceberg expressions don't follow the 3-value SQL boolean 
logic
+            // col NOT IN (1, 2) in Spark is equivalent to notNull(col) && 
notIn(col, 1, 2) in Iceberg
+            In childInFilter = (In) childFilter;
+            Expression notIn = notIn(unquote(childInFilter.attribute()),
+                Stream.of(childInFilter.values())
+                    .map(SparkFilters::convertLiteral)
+                    .collect(Collectors.toList()));
+            return and(notNull(childInFilter.attribute()), notIn);
+          } else if (hasNoInFilter(childFilter)) {

Review comment:
       It is to prevent translation of nested NOT IN predicates inside NOT. I 
think the optimizer will push `NOT` into the expression in most cases but I 
guess it is better to check explicitly.
   
   ```
   NOT (col1 > 10 AND col2 NOT IN (1, 2))
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] aokolnychyi commented on a change in pull request #3613: Spark: Fix NOT IN predicates in SparkFilters

Reply via email to