ConeyLiu commented on code in PR #8446:
URL: https://github.com/apache/iceberg/pull/8446#discussion_r1331162608
##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/SparkFilters.java:
##########
@@ -161,10 +162,13 @@ public static Expression convert(Filter filter) {
case IN:
In inFilter = (In) filter;
+ if (Stream.of(inFilter.values()).anyMatch(Objects::isNull)) {
Review Comment:
Thanks @aokolnychyi @rdblue for the explanations. So the reason that we can
push down safety is:
1. `NULL` in a predicate will be optimized into false in Spark.
2. We have special processing for `NOT IN` and don't allow other `IN` nested
within `NOT`. The Spark `col NOT IN (1, 2)` is converted to Iceberg
expressions: `notNull(col) && notIn(col, 1, 2)`.
I have a question about `IN`. Is it safe to convert `col IN (1, 2)` to
Iceberg: `notNull(col) && in(col, 1, 2)` as well? Why not do that?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]