amol- commented on pull request #12452: URL: https://github.com/apache/arrow/pull/12452#issuecomment-1064943267
> I would personally prefer to see this comment addressed as well (or at least get some thoughts on it): > > > You also need to specify the key column for both left and right table separate. While this is certainly the most generic (since it can handle different names in left and right table), I think it could also be nice to give the user the possibility to just give one name (or list of names) in case it is the same in left/right table (for better ergonomics when using this method) > I'll add support for suffixing columns in the output as supported by HashJoinNodeOptions. > For the join keys columns in the output: you now selected one of the columns for most joins, but not for outer join, I think? I am not fully sure if we should do something different here for outer join (for example, both pandas and dplyr will only have a single key column in the output also in the case of an outer join) That's an interesting point. Personally I think that for outer joins it makes a lot of sense to have both columns. Coalescing the key columns would make the information about from which table the key comes from getting lost. I think it's more reasonable to let users decide if they want to coalesce outer join keys or not, especially given that the coalesce operation would add a cost as we don't provide it in joins out of the box. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
