Phoenix500526 commented on PR #22998:
URL: https://github.com/apache/datafusion/pull/22998#issuecomment-4770002762

   > @Phoenix500526 Thanks for revising this! The `FULL` case needs some more 
consideration -- I'll dig into it and try to respond on Monday.
   
   @neilconway Thanks so much for offering to dig into the `FULL` case! Good 
news though: I kept poking at it and found a third approach that fixes `FULL` 
and survives the unparse round-trip, so you're off the hook 😄 I'd still love to 
hear what you think, of course. 
   
   Quick rundown:
   
   **Idea: rename the merged key to a reserved internal *name*, not a fake 
qualifier.**
   
   Both of my earlier attempts died at unparse because a `FULL USING` merged 
key is a `COALESCE` the unparser can't turn back into `USING`. So instead of 
dodging the `{ k, a.k }` clash with a phantom qualifier, I dodge it with a 
plain reserved name.
   
   During sort push-down, when a qualified sort column (`a.k`) is about to be 
folded into a projection that already exposes the unqualified merged key `k`, I 
rename that merged key to `__datafusion_merged_key_k` first, then restore the 
original name in a wrapper projection:
   
   ```text
   Projection: __datafusion_merged_key_k AS k                    -- wrapper 
restores `k`
     Sort: a.k DESC
       Projection: CASE WHEN a.k IS NOT NULL THEN a.k ELSE b.k END AS 
__datafusion_merged_key_k, a.k
         Full Join: Using a.k = b.k
   ```
   
   The folded schema is now `{ __datafusion_merged_key_k, a.k }`— distinct 
names, no ambiguity. If the sort itself also references the merged key (`ORDER 
BY a.k, k`), I rewrite that reference to the same internal name so the 
multi-key sort keeps resolving. (`ORDER BY a.k, k` was legal before the 
merged-key work, so this keeps it from regressing.)
   
   
   I've added round-trip tests (`USING` and `NATURAL FULL`) plus execution 
coverage for `ORDER BY a.k` and `ORDER BY a.k, k`.
   
   One loose end I should flag: the internal name leaks into EXPLAIN. The 
result column is k and the unparsed SQL is clean, but the intermediate plan 
nodes still show `__datafusion_merged_key_k`:
   
   ```text
   logical_plan
   01)Projection: __datafusion_merged_key_k AS k
   02)--Sort: a.k DESC NULLS LAST
   03)----Projection: CASE WHEN a.k IS NOT NULL THEN a.k ELSE b.k END AS 
__datafusion_merged_key_k, a.k
   ...
   
   It's purely cosmetic — no effect on results or round-trip SQL — but it's not 
the prettiest. Hiding it completely would mean touching the generic plan 
Display, which felt like overkill, so I left it for now. If you spot a cleaner 
way, I'm all ears.
   
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to