Phoenix500526 commented on PR #22998:
URL: https://github.com/apache/datafusion/pull/22998#issuecomment-4738497299
> Thanks for sending this PR! Overall I agree that the current behavior is
wrong and should be fixed.
>
> I think the current approach produces an error for queries like `SELECT k
FROM a FULL JOIN b USING(k) ORDER BY a.k`, because the sort introduces a hidden
projection column.
Hi, @neilconway I dug into this one. The failure is a schema clash: `ORDER
BY a.k` drags `a.k` into the same projection as the merged key `k`, and for a
`FULL join k` is `COALESCE(a.k, b.k) `(unqualified). DFSchema won't allow an
unqualified `k` next to a qualified `a.k` — that's the "would be ambiguous"
error.
I found two ways out, neither perfectly clean.
One is to give the merged key a fake qualifier, e.g. `COALESCE(a.k, b.k) AS
__join_virtual_table__.k`, so it's qualified and stops clashing. That actually
fixes both `ORDER BY a.k` and `SELECT k, a.k`. The catch is the fake table
leaks: you see it in `EXPLAIN`, and unparse falls over because the qualifier
points at nothing real — plan_to_sql emits SQL referencing a table that doesn't
exist and it won't re-plan:
```sql
SELECT __join_virtual_table__.order_id FROM (
SELECT CASE WHEN o1.order_id IS NOT NULL THEN o1.order_id ELSE o2.order_id
END AS order_id, o1.order_id
FROM orders o1 FULL JOIN orders o2 USING(order_id) ORDER BY o1.order_id)
-- re-plan: qualified field name o1.order_id and unqualified field name
order_id would be ambiguous
```
The other is to push the Sort below the merged-key projection so it reads
a.k straight off the join:
```text
Projection: COALESCE(a.k, b.k) AS k
Sort: a.k
Full Join: Using a.k = b.k
```
Clean plan, fixes the `ORDER BY` case. But now the `Sort` sits directly on
the join with no projection, so unparse falls back to `SELECT *, *` (one star
per side), which expands the merged key twice and again won't re-plan:
```sql
SELECT ... FROM (SELECT *, * FROM orders o1 FULL JOIN orders o2
USING(order_id) ORDER BY o1.order_id)
-- re-plan: Projections require unique expression names ... order_id ...
order_id
```
So both fix execution but both trip up unparse — a FULL USING merged key is
a COALESCE the unparser doesn't turn back into USING. Neither feels clean
enough to me — do you have a better approach in mind for this case?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]