metesynnada opened a new issue, #5321:
URL: https://github.com/apache/arrow-datafusion/issues/5321
**Is your feature request related to a problem or challenge? Please describe
what you are trying to do.**
The ordinary hash join (OHJ) is a great solution when one side of the data
is static and can fit into memory. However, the sort-merge join (SMJ) is more
effective when keys in the join condition are already sorted. If the join
filter expression has order guarantees, but not the join key, both OHJ, and SMJ
can result in suboptimal performance.
This is where Symmetric Hash Join (SHJ) comes in. SHJ addresses the gap in
join use cases by introducing support for filter expressions with order
guarantees, such as sliding windows.
For example, consider the following query:
```sql
SELECT * FROM left_table, right_table
WHERE
left_key = right_key AND
a > b + 3 AND
a < b + 10
```
In this scenario, the columns **`a`** and **`b`** are sorted. In this case,
SMJ wouldn't be effective and OHJ may struggle with low cardinality join keys.
SHJ extends the join capabilities of Datafusion by handling such use cases
efficiently. While ordinary hash join typically remains the preferable option
when both sources are finite, the join type can be changed to SHJ using a
`PipelineFixer` sub-rule when both sources are unbounded.
**Describe the solution you'd like**
At skeleton implementation of SHJ that can be improved on, maybe first
limited to partitioned mode only and lacking full support for output order
information, but extensible enough so that these capabilities can be
implemented later on. In detail:
- Provide a way to support sliding window semantics in `PhysicalExpr`s
- Add a sub-rule to `PipelineFixer` to replace the `HashJoin` if necessary.
**Describe alternatives you've considered**
NA
**Additional context**
NA
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]