liukun4515 commented on issue #5022:
URL:
https://github.com/apache/arrow-datafusion/issues/5022#issuecomment-1408038396
> > Other physicalexec have the same behavior?
>
> I don't know if there is other physical plan has the same behavior. but
maybe executing the same partition one more time is not a good idea, it will
waste a lot of time, a better way is to cache the result(`OnceFut`), or only
call once.
>
> `NestedLoopJoinExec` will sometimes build-left, and sometimes build-right
now, and they are using the same logic, I think a better way is to give
`build-left` a logic to handle join, and give `build-right` another logic. Then
the partitions of the `non-build-side` can be executed only once.
>
> The bug of example is the case of `build-right`, I see the
`NestedLoopJoinStream` is using `build-left` logic.
>
> I think maybe it is an another cause. @liukun4515 @alamb
I think the `build-side` and `probe-side` are the concept for the Hash-Join.
In the NLJ, we need the outer-side/outer table and inner-side/inner table.
```
for out-row in outer-table
for inner-row in inner-table
check-join
```
The inner table will be traveled many times in the most basic implementation.
In the current implementation, the left table is the outer table, the right
table is the inner table.
For the `inner join` with multi left partition and single right partition:
```
for left-partition-x in left multi partition:
join(load-left(left-partition-x), load-right(single-right-partition))
```
I think this implementation of iter is matched with the algorithm of the NLJ.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]