Andy Grove created ARROW-11094:
----------------------------------
Summary: [Rust] [DataFusion] Implement Sort-Merge Join
Key: ARROW-11094
URL: https://issues.apache.org/jira/browse/ARROW-11094
Project: Apache Arrow
Issue Type: New Feature
Components: Rust - DataFusion
Reporter: Andy Grove
Fix For: 4.0.0
The current hash join works well when one side of the join can be loaded into
memory but cannot scale beyond the available RAM.
The advantage of implementing SMJ (Sort-Merge Join) is that we can sort the
left and right partitions in parallel and then stream both sides of the join by
merging these sorted partitions and we do not need to load one side into
memory. At most, we need to load all batches from both sides that contain the
current join key values.
https://en.wikipedia.org/wiki/Sort-merge_join
--
This message was sent by Atlassian Jira
(v8.3.4#803005)