nathanb9 opened a new issue, #22881:
URL: https://github.com/apache/datafusion/issues/22881

   ## Describe the bug
   
   Per SQL:2016 §7.10, a `USING` / `NATURAL` join column is `COALESCE(left.k, 
right.k)`. For a right-only row of a `RIGHT`/`FULL` outer join the left key is 
NULL-padded, so the merged key must take the **right** key's value. DataFusion 
instead resolves the unqualified key to the **left** column only, so it comes 
out `NULL` for right-only rows.
   
   This is silent (the query plans and runs fine) and it corrupts downstream 
`WHERE` and `GROUP BY` on the key. `LEFT JOIN USING` / `INNER` are unaffected — 
the defect only appears when the left side is NULL-padded (`RIGHT`/`FULL`).
   
   ## To Reproduce
   
   ```sql
   create table a(k int, x int) as values (1, 10);
   create table b(k int, y int) as values (4, 400);
   
   -- WRONG: right-only row reports k = NULL instead of 4
   select k from a full join b using (k) order by k nulls last;
   -- actual:   1, NULL      expected: 1, 4
   
   -- Diagnosis: the unqualified `k` is bound to a.k, not COALESCE(a.k, b.k)
   select k as unqual_k, a.k as a_k, b.k as b_k
   from a full join b using (k) order by unqual_k nulls last;
   -- unqual_k | a_k  | b_k
   --    1     |  1   | NULL
   --   NULL   | NULL |  4      <-- unqual_k should be 4 (= b.k)
   
   -- Same defect for RIGHT JOIN USING and NATURAL RIGHT/FULL JOIN:
   select k from a right join b using (k);                     -- right-only 
row: NULL (should be 4)
   select k from a natural full join b order by k nulls last;  -- NULL (should 
be 4)
   ```
   
   The standard-equivalent explicit form is correct, confirming the divergence:
   
   ```sql
   select coalesce(a.k, b.k) as k, a.x, b.y
   from a full join b on a.k = b.k order by k nulls last;
   -- 1, 10, NULL
   -- 4, NULL, 400   <-- k = 4 here (correct)
   ```
   
   ### Downstream impact (beyond display)
   
   The wrong NULL key propagates into filters and grouping:
   
   ```sql
   -- drops a real row (or errors on `Ambiguous reference` for SELECT *):
   select * from a full join b using (k) where k = 4;          -- should be 1 
row
   -- buckets every right-only row under NULL instead of its real key:
   select k, count(*) from a full join b using (k) group by k; -- right-only 
rows land under k=NULL
   ```
   
   ## Expected behavior
   
   The merged `USING`/`NATURAL` key column should equal `COALESCE(left.k, 
right.k)` for all join types, so right-only rows of a `RIGHT`/`FULL` join 
expose the right key value (matching the explicit `SELECT coalesce(a.k, b.k) 
... ON a.k = b.k` form, and matching PostgreSQL / the SQL standard).
   
   ## Additional context
   
   The logical plan binds the merged key to the left column with no `COALESCE`:
   
   ```
   > explain format indent select k from a full join b using (k);
   
   Projection: a.k                       <-- unqualified k bound to a.k (not 
coalesce(a.k, b.k))
     Full Join: Using a.k = b.k
       TableScan: a projection=[k]
       TableScan: b projection=[k]
   ```
   
   The `USING`/`NATURAL` constraint is lowered in 
`datafusion/sql/src/relation/join.rs` (`JoinConstraint::Using` and 
`JoinConstraint::Natural`, both calling `join_using(...)`); the 
merged/unqualified key resolves to the left key during column/wildcard 
resolution rather than to `COALESCE(left.k, right.k)`.
   
   A correct fix exposes a single coalesced key for `USING`/`NATURAL` 
constraints — resolving the merged column to `COALESCE(left.k, right.k)` during 
column/wildcard resolution rather than aliasing to `left.k`. There is currently 
**no sqllogictest coverage** for `RIGHT`/`FULL` `USING`/`NATURAL`, which is why 
it went unnoticed.
   
   **Version:** reproduced on `main` (DataFusion 54.0.0 dev) via 
`datafusion-cli`; originally observed on 53.1.0.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to