ygf11 commented on issue #5022:
URL:
https://github.com/apache/arrow-datafusion/issues/5022#issuecomment-1413293683
Thanks for response @liukun4515. I think I find the way to fix this bug.
> In the current implementation, the left table is the outer table, the
right table is the inner table.
IMOP, The relationship between `inner/outer table` and `left/right table` is
not fixed, it is decided by the required distribution(join type).
For `right/right-semi/right-anti` joins,
* left is the single partition side.
* right is the multiple partition side.
* the output partition count is the same as right table.
The algorithm will be like:
```
for x in 0..output-partition-count:
join(right-partition(x), single-left-partition)
```
we can see left table will be visited one more time, and right table will be
visited only once.
That means right is the outer table side, and left is the inner table side
in nested loop join.
> The inner table will be traveled many times in the most basic
implementation.
Yes, visiting a relation many time is a common operation.
In datafusion, other physical plans also have similar requirement, like
`CrossJoinExec`, it works well with `OnceFut`.
Currently the implementation will always add `OnceFut` for the left table,
which is not correct(left table maybe outer table side).
I think if we correct add `OnceFut` for the inner table, the queries will
success.
I create a new pr #5156 to fix this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]