[GitHub] [iceberg] ConeyLiu commented on a diff in pull request #8446: Spark: value list of IN/NOT_IN containing null value should not be converted to Iceberg expression

via GitHub Wed, 20 Sep 2023 07:19:37 -0700


ConeyLiu commented on code in PR #8446:
URL: https://github.com/apache/iceberg/pull/8446#discussion_r1331162608



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/SparkFilters.java:
##########
@@ -161,10 +162,13 @@ public static Expression convert(Filter filter) {
 
         case IN:
           In inFilter = (In) filter;
+          if (Stream.of(inFilter.values()).anyMatch(Objects::isNull)) {

Review Comment:
   Thanks @aokolnychyi @rdblue for the explanations. So the reason that we can 
push down safety is:
   1. `NULL` in a predicate will be optimized into false in Spark.
   2. We have special processing for `NOT IN` and don't allow other `IN` nested 
within `NOT`. The Spark `col NOT IN (1, 2)` is converted to Iceberg 
expressions: `notNull(col) && notIn(col, 1, 2)`.
   
   I have a question about `IN`. Is it safe to convert `col IN (1, 2)` to 
Iceberg: `notNull(col) && in(col, 1, 2)` as well? Why not do that?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] ConeyLiu commented on a diff in pull request #8446: Spark: value list of IN/NOT_IN containing null value should not be converted to Iceberg expression

Reply via email to