Jorge created ARROW-9423:
----------------------------

             Summary: Add join
                 Key: ARROW-9423
                 URL: https://issues.apache.org/jira/browse/ARROW-9423
             Project: Apache Arrow
          Issue Type: Task
          Components: Rust - DataFusion
            Reporter: Jorge


A major operation in analytics is the join. This issue concerns adding the join 
operation.

Given the complexity of this task, I propose starting with a sub-set of all 
joins, an inner join whose "ON" can only be a set of column names (i.e. no 
expressions).

Suggestion for DOD:

* physical plan to execute the join
* logical plan with the join
* SQL planner with the join
* tests on each of the above

One idea to perform this join in parallel is to, for each RecordBatch in the 
left, perform the join with a record on the right. Another way is to first 
perform a hash by key and sort on both sides, and then perform a 
"SortMergeJoin" on each of the partitions. There may be better ways to achieve 
this, though.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to