Phoenix500526 commented on PR #22998:
URL: https://github.com/apache/datafusion/pull/22998#issuecomment-4770002762
> @Phoenix500526 Thanks for revising this! The `FULL` case needs some more
consideration -- I'll dig into it and try to respond on Monday.
@neilconway Thanks so much for offering to dig into the `FULL` case! Good
news though: I kept poking at it and found a third approach that fixes `FULL`
and survives the unparse round-trip, so you're off the hook 😄 I'd still love to
hear what you think, of course.
Quick rundown:
**Idea: rename the merged key to a reserved internal *name*, not a fake
qualifier.**
Both of my earlier attempts died at unparse because a `FULL USING` merged
key is a `COALESCE` the unparser can't turn back into `USING`. So instead of
dodging the `{ k, a.k }` clash with a phantom qualifier, I dodge it with a
plain reserved name.
During sort push-down, when a qualified sort column (`a.k`) is about to be
folded into a projection that already exposes the unqualified merged key `k`, I
rename that merged key to `__datafusion_merged_key_k` first, then restore the
original name in a wrapper projection:
```text
Projection: __datafusion_merged_key_k AS k -- wrapper
restores `k`
Sort: a.k DESC
Projection: CASE WHEN a.k IS NOT NULL THEN a.k ELSE b.k END AS
__datafusion_merged_key_k, a.k
Full Join: Using a.k = b.k
```
The folded schema is now `{ __datafusion_merged_key_k, a.k }`— distinct
names, no ambiguity. If the sort itself also references the merged key (`ORDER
BY a.k, k`), I rewrite that reference to the same internal name so the
multi-key sort keeps resolving. (`ORDER BY a.k, k` was legal before the
merged-key work, so this keeps it from regressing.)
I've added round-trip tests (`USING` and `NATURAL FULL`) plus execution
coverage for `ORDER BY a.k` and `ORDER BY a.k, k`.
One loose end I should flag: the internal name leaks into EXPLAIN. The
result column is k and the unparsed SQL is clean, but the intermediate plan
nodes still show `__datafusion_merged_key_k`:
```text
logical_plan
01)Projection: __datafusion_merged_key_k AS k
02)--Sort: a.k DESC NULLS LAST
03)----Projection: CASE WHEN a.k IS NOT NULL THEN a.k ELSE b.k END AS
__datafusion_merged_key_k, a.k
...
It's purely cosmetic — no effect on results or round-trip SQL — but it's not
the prettiest. Hiding it completely would mean touching the generic plan
Display, which felt like overkill, so I left it for now. If you spot a cleaner
way, I'm all ears.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]