liukun4515 commented on issue #4356:
URL:
https://github.com/apache/arrow-datafusion/issues/4356#issuecomment-1327188814
> > The current code base has many `match` path for each `Join_type`, each
`join_type` has different logic and path, it easy to produce the bugs when we
add feature in the `HashJoin`.
>
> Yes, I agree.
>
> > split vectorization `HashJoin` to three phase:
> >
> > 1. get the result of matched equal join : left_idx and right_idx
> > 2. apply non_equal filter to `left_idx and right_idx` and get the
filter_left_idx with filter_right_idx
> > 3. according to the `Join Type` to construct the result
>
> For HashJoin, there are two big phases: **build** and **probe**:
>
> 1. For **build** phase, we don't care **JoinType** almost
> 2. For **probe** phase, **JoinType** is the direction. So how about
spitting `match` paths at the beginning of **probe** phase
> ```rust
> match join_type {
> inner => probe_inner_join(),
> left => probe_left_join(),
> ....
> }
> ```
>
>
>
>
>
>
>
>
>
>
>
> In each probe method, we can process non-equi conditions and equi
conditions. Non-equi conditions's results depend on **JoinType**
Probe phase has many common stage.
In the vectorization has join, the first stage is to get the left/right
indices which are match the on join condition.
Next, use the left/right indices to generate the batch result according to
the join type. But some special join type should maintain the left side bitmap
to generate the result finally, for example left/full/leftanti/leftsemi.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]