Weston Pace created ARROW-15957:
-----------------------------------
Summary: [C++] Add option to consolidate key columns in hash join
Key: ARROW-15957
URL: https://issues.apache.org/jira/browse/ARROW-15957
Project: Apache Arrow
Issue Type: New Feature
Components: C++
Reporter: Weston Pace
Currently the hash join outputs key columns from both sides. On an outer join
this can help distinguish between a row that matched but had entirely null
payloads on one side and a row that didn't match on one side.
However, that distinction is sometimes not very important and many databases
will simply coalesce the key columns into one. For example, we might get an
outer join result today that looks like:
{noformat}
L_KEY | R_KEY | L_PAY | R_PAY
0 0 x Y
NULL 1 NULL Z
2 NULL A NULL
{noformat}
Ideally we could specify a "combine key columns" option to get a result that
looks like:
{noformat}
KEY | L_PAY | R_PAY
0 x Y
1 NULL Z
2 A NULL
{noformat}
This can be done today with an extra project step, and it isn't likely to offer
much performance benefit, but from a usability perspective it would be nice if
users didn't have to do this extra project step.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)