kosiew commented on issue #1305:
URL: 
https://github.com/apache/datafusion-python/issues/1305#issuecomment-3702139803

   The current Python join wrapper drops mutually named keys by default for all 
join types, including full, by projecting away one side’s qualified key column 
when drop_duplicate_keys is true. 
   
https://github.com/apache/datafusion-python/blob/fcd70567dedc580416c2931cc7f25e3960704ace/src/dataframe.rs#L650-L715
   
   
   The user guide also documents this default key-dropping behavior without 
noting any join-type exceptions.
   
https://github.com/apache/datafusion-python/blob/fcd70567dedc580416c2931cc7f25e3960704ace/docs/source/user-guide/common-operations/joins.rst#L109-L136
   
   Given that, I agree with @renato’s concern: for full outer joins the two key 
columns are not equivalent, so dropping one of them can remove the only way to 
represent unmatched rows. Disallowing drop_duplicate_keys=True (or forcing it 
to False) for how="full" would avoid silent data loss and keep both key columns 
available for user-controlled coalescing or renaming afterward, aligning with 
SQL-style expectations.
   
   @mesejo’s coalesce-based approach also addresses the correctness gap and 
matches behaviors in other libraries. If we keep drop_duplicate_keys=True as 
the default, applying coalesce for outer joins instead of dropping one side 
would preserve rows while still returning a single key column; the trade-off is 
slightly higher compute and less user control over per-column treatment. Either 
way, documenting the chosen semantics explicitly for full joins will help set 
user expectations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to