[ 
https://issues.apache.org/jira/browse/ARROW-15957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tobias Zagorni reassigned ARROW-15957:
--------------------------------------

    Assignee: Tobias Zagorni

> [C++] Add option to consolidate key columns in hash join
> --------------------------------------------------------
>
>                 Key: ARROW-15957
>                 URL: https://issues.apache.org/jira/browse/ARROW-15957
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++
>            Reporter: Weston Pace
>            Assignee: Tobias Zagorni
>            Priority: Major
>
> Currently the hash join outputs key columns from both sides.  On an outer 
> join this can help distinguish between a row that matched but had entirely 
> null payloads on one side and a row that didn't match on one side.
> However, that distinction is sometimes not very important and many databases 
> will simply coalesce the key columns into one.  For example, we might get an 
> outer join result today that looks like:
> {noformat}
> L_KEY | R_KEY | L_PAY | R_PAY
>     0       0       x       Y
>  NULL       1    NULL       Z
>     2    NULL       A    NULL
> {noformat}
> Ideally we could specify a "combine key columns" option to get a result that 
> looks like:
> {noformat}
> KEY | L_PAY | R_PAY
>   0       x       Y
>   1    NULL       Z
>   2       A    NULL
> {noformat}
> This can be done today with an extra project step, and it isn't likely to 
> offer much performance benefit, but from a usability perspective it would be 
> nice if users didn't have to do this extra project step.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to