[
https://issues.apache.org/jira/browse/ARROW-15957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tobias Zagorni reassigned ARROW-15957:
--------------------------------------
Assignee: Tobias Zagorni
> [C++] Add option to consolidate key columns in hash join
> --------------------------------------------------------
>
> Key: ARROW-15957
> URL: https://issues.apache.org/jira/browse/ARROW-15957
> Project: Apache Arrow
> Issue Type: New Feature
> Components: C++
> Reporter: Weston Pace
> Assignee: Tobias Zagorni
> Priority: Major
>
> Currently the hash join outputs key columns from both sides. On an outer
> join this can help distinguish between a row that matched but had entirely
> null payloads on one side and a row that didn't match on one side.
> However, that distinction is sometimes not very important and many databases
> will simply coalesce the key columns into one. For example, we might get an
> outer join result today that looks like:
> {noformat}
> L_KEY | R_KEY | L_PAY | R_PAY
> 0 0 x Y
> NULL 1 NULL Z
> 2 NULL A NULL
> {noformat}
> Ideally we could specify a "combine key columns" option to get a result that
> looks like:
> {noformat}
> KEY | L_PAY | R_PAY
> 0 x Y
> 1 NULL Z
> 2 A NULL
> {noformat}
> This can be done today with an extra project step, and it isn't likely to
> offer much performance benefit, but from a usability perspective it would be
> nice if users didn't have to do this extra project step.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)