nuno-faria opened a new issue, #20669:
URL: https://github.com/apache/datafusion/issues/20669

   ### Describe the bug
   
   This is a regression found on `branch-53`.
   
   When a physical right join (e.g., from a `RightAnti`) is executed with an 
empty schema (e.g., from `count(*)` queries), it returns invalid results. The 
issue comes from the `build_batch_from_indices` function:
   
https://github.com/apache/datafusion/blob/d1a305821f4d883a2a1e13d0bbc54af43d6d0441/datafusion/physical-plan/src/joins/utils.rs#L913-L925
   
   If the schema is empty, an empty batch with `build_indices.len()` rows is 
returned. However, if the join type is, e.g., `RightAnti`, the correct number 
of rows should be retrieved from the probe side.
   
   One way I see of fixing this would be to add the join type to 
`build_batch_from_indices`, since I don't think that info is available from the 
current arguments. However this might require a large number of changes.
   
   ### To Reproduce
   
   ```sql
   create table t1 (k int, v int);
   create table t2 (k int, v int);
   insert into t1 select i as k, i as v from generate_series(1, 100) t(i);
   insert into t2 values (1, 1);
   
   -- select * is ok
   with t as (
       select *
       from t1
       left anti join t2 on t1.k = t2.k
   )
   select *
   from t;
   +----+----+
   | k  | v  |
   +----+----+
   | 2  | 2  |
   | 3  | 3  |
   | 4  | 4  |
   | 5  | 5  |
   ...
   +----+----+
   99 row(s) fetched.
   
   -- select count(*) is wrong
   with t as (
       select *
       from t1
       left anti join t2 on t1.k = t2.k
   )
   select count(*)
   from t;
   +----------+
   | count(*) |
   +----------+
   | 1        |
   +----------+
   1 row(s) fetched.
   ```
   
   ### Expected behavior
   
   Return the correct number of rows.
   
   ### Additional context
   
   This bug was exposed by the empty project optimization done by #20191, but 
the behavior of always returning the number of rows from the build side already 
existed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to