Weston Pace created ARROW-15957:
-----------------------------------

             Summary: [C++] Add option to consolidate key columns in hash join
                 Key: ARROW-15957
                 URL: https://issues.apache.org/jira/browse/ARROW-15957
             Project: Apache Arrow
          Issue Type: New Feature
          Components: C++
            Reporter: Weston Pace


Currently the hash join outputs key columns from both sides.  On an outer join 
this can help distinguish between a row that matched but had entirely null 
payloads on one side and a row that didn't match on one side.

However, that distinction is sometimes not very important and many databases 
will simply coalesce the key columns into one.  For example, we might get an 
outer join result today that looks like:

{noformat}
L_KEY | R_KEY | L_PAY | R_PAY
    0       0       x       Y
 NULL       1    NULL       Z
    2    NULL       A    NULL
{noformat}

Ideally we could specify a "combine key columns" option to get a result that 
looks like:

{noformat}
KEY | L_PAY | R_PAY
  0       x       Y
  1    NULL       Z
  2       A    NULL
{noformat}

This can be done today with an extra project step, and it isn't likely to offer 
much performance benefit, but from a usability perspective it would be nice if 
users didn't have to do this extra project step.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to