AdamGS commented on code in PR #22810:
URL: https://github.com/apache/datafusion/pull/22810#discussion_r3420396412


##########
datafusion/sqllogictest/test_files/subquery.slt:
##########
@@ -2437,3 +2437,45 @@ DROP TABLE sq_count_customer;
 
 statement ok
 DROP TABLE sq_count_orders;
+
+# Regression test: `NOT IN` is a null-aware anti join. When the subquery 
yields a
+# NULL the predicate is never TRUE, so the query must return zero rows. This 
must
+# hold regardless of the chosen physical join operator. Previously, with
+# prefer_hash_join = false and multiple partitions, the planner routed the
+# null-aware anti join to SortMergeJoin (which is not null-aware) and returned
+# wrong results; null-aware anti joins must use the CollectLeft HashJoin.
+statement ok

Review Comment:
   small nit here (I might be a bit overzealous about this) - but an empty line 
between the comment and the code here would make it much easier to read.



##########
datafusion/sqllogictest/test_files/subquery.slt:
##########
@@ -2437,3 +2437,45 @@ DROP TABLE sq_count_customer;
 
 statement ok
 DROP TABLE sq_count_orders;
+
+# Regression test: `NOT IN` is a null-aware anti join. When the subquery 
yields a
+# NULL the predicate is never TRUE, so the query must return zero rows. This 
must
+# hold regardless of the chosen physical join operator. Previously, with
+# prefer_hash_join = false and multiple partitions, the planner routed the
+# null-aware anti join to SortMergeJoin (which is not null-aware) and returned
+# wrong results; null-aware anti joins must use the CollectLeft HashJoin.
+statement ok
+set datafusion.optimizer.prefer_hash_join = false;
+
+statement ok
+CREATE TABLE nia_left(x INT) AS VALUES (1), (2), (3), (4);
+
+statement ok
+CREATE TABLE nia_right_with_null(y INT) AS VALUES (2), (NULL);
+
+statement ok
+CREATE TABLE nia_right_no_null(y INT) AS VALUES (2), (4);
+
+# Subquery contains a NULL -> NOT IN must return no rows.
+query I
+SELECT x FROM nia_left WHERE x NOT IN (SELECT y FROM nia_right_with_null) 
ORDER BY x;
+----
+
+# Subquery has no NULL -> NOT IN behaves like a normal anti join.

Review Comment:
   maybe worth adding an `EXPLAIN` test, if you rebase on `main` you'll #22913 
which shows explicitly the `null_aware` hash join.



##########
datafusion/sqllogictest/test_files/subquery.slt:
##########
@@ -2437,3 +2437,45 @@ DROP TABLE sq_count_customer;
 
 statement ok
 DROP TABLE sq_count_orders;
+
+# Regression test: `NOT IN` is a null-aware anti join. When the subquery 
yields a
+# NULL the predicate is never TRUE, so the query must return zero rows. This 
must
+# hold regardless of the chosen physical join operator. Previously, with
+# prefer_hash_join = false and multiple partitions, the planner routed the
+# null-aware anti join to SortMergeJoin (which is not null-aware) and returned
+# wrong results; null-aware anti joins must use the CollectLeft HashJoin.
+statement ok
+set datafusion.optimizer.prefer_hash_join = false;
+
+statement ok
+CREATE TABLE nia_left(x INT) AS VALUES (1), (2), (3), (4);
+
+statement ok
+CREATE TABLE nia_right_with_null(y INT) AS VALUES (2), (NULL);
+
+statement ok
+CREATE TABLE nia_right_no_null(y INT) AS VALUES (2), (4);
+
+# Subquery contains a NULL -> NOT IN must return no rows.
+query I
+SELECT x FROM nia_left WHERE x NOT IN (SELECT y FROM nia_right_with_null) 
ORDER BY x;
+----
+
+# Subquery has no NULL -> NOT IN behaves like a normal anti join.
+query I
+SELECT x FROM nia_left WHERE x NOT IN (SELECT y FROM nia_right_no_null) ORDER 
BY x;
+----
+1
+3
+
+statement ok
+DROP TABLE nia_left;
+
+statement ok
+DROP TABLE nia_right_with_null;
+
+statement ok
+DROP TABLE nia_right_no_null;
+
+statement ok
+set datafusion.optimizer.prefer_hash_join = true;

Review Comment:
   use `reset` here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to