[GitHub] [arrow-datafusion] xudong963 commented on issue #4356: refactor the code of the `HashJoin`

GitBox Thu, 24 Nov 2022 07:00:53 -0800


xudong963 commented on issue #4356:
URL: 
https://github.com/apache/arrow-datafusion/issues/4356#issuecomment-1326562094


   > The current code base has many `match` path for each `Join_type`, each 
`join_type` has different logic and path, it easy to produce the bugs when we 
add feature in the `HashJoin`.
   
   Yes, I agree.
   
   > split vectorization `HashJoin` to three phase:
   > 
   > 1. get the result of matched equal join : left_idx and right_idx
   > 2. apply non_equal filter to `left_idx and right_idx` and get the 
filter_left_idx with filter_right_idx
   > 3. according to the `Join Type` to construct the result
   
   For HashJoin, there are two big phases: **build** and **probe**:
   
   1. For **build** phase, we don't care **JoinType** almost
   2. For **probe** phase, **JoinType** is the direction.  So how about 
spitting `match` paths at the beginning of **probe** phase
       ```rust
        match join_type {
            inner => probe_inner_join(),
            left => probe_left_join(),
            ....
        }
       ```
        In each probe method, we can process non-equi conditions and equi 
conditions. Non-equi conditions's results depend on **JoinType**


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] xudong963 commented on issue #4356: refactor the code of the `HashJoin`

Reply via email to