mdashti opened a new pull request, #23173:
URL: https://github.com/apache/datafusion/pull/23173

   ## Which issue does this close?
   
   Closes #23126.
   
   ## Rationale for this change
   
   `x NOT IN (subquery)` plans to a null-aware `LeftAnti` hash join (build = 
outer `x`, probe = subquery). Join dynamic filter pushdown pushes a bounds + 
membership filter, built from the build keys, onto the probe scan. That filter 
can prune every probe row. A null-aware `LeftAnti` reads an empty probe as a 
genuinely-empty subquery, so it emits build-side NULL rows that should drop: 
`NULL NOT IN (non-empty)` is UNKNOWN, not TRUE.
   
   The result is scan-dependent, so it's a silent correctness bug. A `VALUES` 
scan ignores the pushed filter and stays correct; a parquet scan applies it and 
is wrong.
   
   `#23103` (the probe-side NULL drop) is orthogonal; this is the build-side 
NULL.
   
   ## What changes are included in this PR?
   
   Skip join dynamic filter pushdown for a null-aware anti join when the build 
key can be NULL. The build-side NULL emission depends on whether the probe is 
truly empty, which the pushed filter can change by emptying it. A NOT NULL 
build key has no such NULL, so it keeps the pushdown.
   
   ## Are these changes tested?
   
   Yes. A `push_down_filter_parquet.slt` case reproduces it (build-side NULL, a 
non-matching parquet probe) and asserts the single correct row. Without the 
change it returns the extra NULL.
   
   ## Are there any user-facing changes?
   
   `NOT IN` over a parquet (or otherwise prunable) scan with a nullable outer 
key now returns correct results. Such joins lose the dynamic filter pushdown.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to