neilconway opened a new issue, #23126:
URL: https://github.com/apache/datafusion/issues/23126

   ### Describe the bug
   
   In a predicate like `x NOT IN (subquery)`, `x` rows that are `NULL` should 
only be emitted if `subquery` is empty.
   
   `x NOT IN (subquery)` plans to a null-aware `LeftAnti` join (outer = build, 
subquery = probe). `enable_join_dynamic_filter_pushdown` (default on) pushes a 
bounds + membership filter, built
   from the outer key, onto the probe scan. This filter might eliminate all 
values from the probe-side input; this is interpreted by the `LeftAnti` join as 
indicating that the probe is actually empty, which means that build-side NULLs 
will incorrectly be emitted.
   
   The result is scan-dependent: a `VALUES` scan ignores the pushed filter and 
is correct, while a parquet scan applies it and is wrong. The probe can be 
emptied by either row-group/page pruning or by row-level filtering.
   
   ### To Reproduce
   
   ```sql
   create table ao(id int)      as values (5), (null);  -- outer: NULL + a 
non-NULL value
   create table i_disj(eid int) as values (2), (3);     -- probe: non-empty, no 
match, no NULL
   
   -- (A) VALUES scan -> 5  (correct)
   select id from ao where id not in (select eid from i_disj) order by id;
   
   copy ao     to '/tmp/ao.parquet'     stored as parquet;
   copy i_disj to '/tmp/i_disj.parquet' stored as parquet;
   create external table ao_p(id int)      stored as parquet location 
'/tmp/ao.parquet';
   create external table i_disj_p(eid int) stored as parquet location 
'/tmp/i_disj.parquet';
   
   -- (B) parquet scan, same query -> 5, NULL  (wrong)
   select id from ao_p where id not in (select eid from i_disj_p) order by id;
   ```
   
   ### Expected behavior
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to