Phoenix500526 commented on PR #22998:
URL: https://github.com/apache/datafusion/pull/22998#issuecomment-4738497299

   > Thanks for sending this PR! Overall I agree that the current behavior is 
wrong and should be fixed.
   > 
   > I think the current approach produces an error for queries like `SELECT k 
FROM a FULL JOIN b USING(k) ORDER BY a.k`, because the sort introduces a hidden 
projection column.
   
   Hi, @neilconway  I dug into this one. The failure is a schema clash: `ORDER 
BY a.k` drags `a.k` into the same projection as the merged key `k`, and for a 
`FULL join k` is `COALESCE(a.k, b.k) `(unqualified). DFSchema won't allow an 
unqualified `k` next to a qualified `a.k` — that's the "would be ambiguous" 
error. 
   
   I found two ways out, neither perfectly clean.
   
   One is to give the merged key a fake qualifier, e.g. `COALESCE(a.k, b.k) AS 
__join_virtual_table__.k`, so it's qualified and stops clashing. That actually 
fixes both `ORDER BY a.k` and `SELECT k, a.k`. The catch is the fake table 
leaks: you see it in `EXPLAIN`, and unparse falls over because the qualifier 
points at nothing real — plan_to_sql emits SQL referencing a table that doesn't 
exist and it won't re-plan:
   
   ```sql
   SELECT __join_virtual_table__.order_id FROM (
     SELECT CASE WHEN o1.order_id IS NOT NULL THEN o1.order_id ELSE o2.order_id 
END AS order_id, o1.order_id
     FROM orders o1 FULL JOIN orders o2 USING(order_id) ORDER BY o1.order_id)
   -- re-plan: qualified field name o1.order_id and unqualified field name 
order_id would be ambiguous
   ```
   
   The other is to push the Sort below the merged-key projection so it reads 
a.k straight off the join:
   ```text
   Projection: COALESCE(a.k, b.k) AS k
     Sort: a.k
       Full Join: Using a.k = b.k
   ```
   Clean plan, fixes the `ORDER BY` case. But now the `Sort` sits directly on 
the join with no projection, so unparse falls back to `SELECT *, *` (one star 
per side), which expands the merged key twice and again won't re-plan:
   
   ```sql
   SELECT ... FROM (SELECT *, * FROM orders o1 FULL JOIN orders o2 
USING(order_id) ORDER BY o1.order_id)
   -- re-plan: Projections require unique expression names ... order_id ... 
order_id
   ```
   So both fix execution but both trip up unparse — a FULL USING merged key is 
a COALESCE the unparser doesn't turn back into USING. Neither feels clean 
enough to me — do you have a better approach in mind for this case?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to