neilconway opened a new issue, #23126: URL: https://github.com/apache/datafusion/issues/23126
### Describe the bug In a predicate like `x NOT IN (subquery)`, `x` rows that are `NULL` should only be emitted if `subquery` is empty. `x NOT IN (subquery)` plans to a null-aware `LeftAnti` join (outer = build, subquery = probe). `enable_join_dynamic_filter_pushdown` (default on) pushes a bounds + membership filter, built from the outer key, onto the probe scan. This filter might eliminate all values from the probe-side input; this is interpreted by the `LeftAnti` join as indicating that the probe is actually empty, which means that build-side NULLs will incorrectly be emitted. The result is scan-dependent: a `VALUES` scan ignores the pushed filter and is correct, while a parquet scan applies it and is wrong. The probe can be emptied by either row-group/page pruning or by row-level filtering. ### To Reproduce ```sql create table ao(id int) as values (5), (null); -- outer: NULL + a non-NULL value create table i_disj(eid int) as values (2), (3); -- probe: non-empty, no match, no NULL -- (A) VALUES scan -> 5 (correct) select id from ao where id not in (select eid from i_disj) order by id; copy ao to '/tmp/ao.parquet' stored as parquet; copy i_disj to '/tmp/i_disj.parquet' stored as parquet; create external table ao_p(id int) stored as parquet location '/tmp/ao.parquet'; create external table i_disj_p(eid int) stored as parquet location '/tmp/i_disj.parquet'; -- (B) parquet scan, same query -> 5, NULL (wrong) select id from ao_p where id not in (select eid from i_disj_p) order by id; ``` ### Expected behavior _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
