[GitHub] [arrow] amol- edited a comment on pull request #12452: ARROW-14292: [C++][Python] Join foundation for Tables

GitBox Fri, 11 Mar 2022 05:35:47 -0800


amol- edited a comment on pull request #12452:
URL: https://github.com/apache/arrow/pull/12452#issuecomment-1064943267



   > I would personally prefer to see this comment addressed as well (or at 
least get some thoughts on it):
   > 
   > > You also need to specify the key column for both left and right table 
separate. While this is certainly the most generic (since it can handle 
different names in left and right table), I think it could also be nice to give 
the user the possibility to just give one name (or list of names) in case it is 
the same in left/right table (for better ergonomics when using this method)
   > 
   
   I'll add support for omitting the right table keys and suffixing columns in 
the output as supported by HashJoinNodeOptions.
   
   > For the join keys columns in the output: you now selected one of the 
columns for most joins, but not for outer join, I think? I am not fully sure if 
we should do something different here for outer join (for example, both pandas 
and dplyr will only have a single key column in the output also in the case of 
an outer join)
   
   That's an interesting point. Personally I think that for outer joins it 
makes a lot of sense to have both columns. Coalescing the key columns would 
make the information about from which table the key comes from getting lost. I 
think it's more reasonable to let users decide if they want to coalesce outer 
join keys or not, especially given that the coalesce operation would add a cost 
as we don't provide it in joins out of the box.
   
   For example
   ```
   Key | Key_t2 | Other
    1  |  null  |  55            <---- Obvious that the value comes from Table1 
and not Table2
   ```
   VS
   ```
   Key | Other
    1  |  55                     <---- Where did "1" come from?
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] amol- edited a comment on pull request #12452: ARROW-14292: [C++][Python] Join foundation for Tables

Reply via email to