mdashti opened a new issue, #23103:
URL: https://github.com/apache/datafusion/issues/23103

   # What happens?
   
   A null-aware anti join (`NOT IN`) returns rows when its inner side has a 
NULL, instead of zero. With `enable_join_dynamic_filter_pushdown` on, 
`HashJoinExec` pushes a build-side filter (`key IN build_keys`) to the probe 
scan. That filter drops the probe's NULL row before the join's null-aware check 
runs, so `NOT IN` three-valued logic never collapses the result to empty.
   
   It only shows when the probe scan applies the filter at row level, e.g. 
parquet with `pushdown_filters = true`. An in-memory scan that ignores the 
pushed filter still returns the correct empty result.
   
   Reproduced on `54.0.0` and current `main`.
   
   ## To Reproduce
   
   ```sql
   set datafusion.optimizer.enable_join_dynamic_filter_pushdown = true;
   set datafusion.execution.parquet.pushdown_filters = true;
   
   create table outer_t(id int) as values (1), (2), (3);
   create table inner_t(eid int) as values (2), (null);
   
   copy outer_t to '/tmp/outer.parquet' stored as parquet;
   copy inner_t to '/tmp/inner.parquet' stored as parquet;
   
   create external table outer_p(id int) stored as parquet location 
'/tmp/outer.parquet';
   create external table inner_p(eid int) stored as parquet location 
'/tmp/inner.parquet';
   
   select id from outer_p where id not in (select eid from inner_p) order by id;
   ```
   
   Expected zero rows: a NULL in the inner set makes every `NOT IN` comparison 
unknown. Instead it returns:
   
   ```
   +----+
   | id |
   +----+
   | 1  |
   | 3  |
   +----+
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to