xudong963 commented on issue #4356:
URL:
https://github.com/apache/arrow-datafusion/issues/4356#issuecomment-1326562094
> The current code base has many `match` path for each `Join_type`, each
`join_type` has different logic and path, it easy to produce the bugs when we
add feature in the `HashJoin`.
Yes, I agree.
> split vectorization `HashJoin` to three phase:
>
> 1. get the result of matched equal join : left_idx and right_idx
> 2. apply non_equal filter to `left_idx and right_idx` and get the
filter_left_idx with filter_right_idx
> 3. according to the `Join Type` to construct the result
For HashJoin, there are two big phases: **build** and **probe**:
1. For **build** phase, we don't care **JoinType** almost
2. For **probe** phase, **JoinType** is the direction. So how about
spitting `match` paths at the beginning of **probe** phase
```rust
match join_type {
inner => probe_inner_join(),
left => probe_left_join(),
....
}
```
In each probe method, we can process non-equi conditions and equi
conditions. Non-equi conditions's results depend on **JoinType**
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]