ygf11 commented on issue #5022:
URL:
https://github.com/apache/arrow-datafusion/issues/5022#issuecomment-1407624893
> NestedLoopJoinExec will sometimes build-left, and sometimes build-right
now, and they are using the same logic, I think a better way is to give
build-left a logic to handle join, and give build-right another logic. Then the
partitions of the non-build-side can be executed only once.
Let me explain more.
The main difference is the way of iteration.
For `build-left`, left has a single partition, right has multiple
partitions. The iteration will be like:
```rust
// Given a partition number - x
// execute right-partition-x
for batch in right-partition-x {
join(batch, left-data-(single partition))
}
```
For `build-right`, left has multiple partitions, right has a single
partition. The iteration will be like:
```rust
// Given a partition number - x
// execute left-partition-x
for batch in left-partition-x {
join(batch, right-data(single partition))
}
```
If we use `build-left` logic for `build-right`(issue-example), then:
```rust
// Given a partition number - x
// Execute right-partition-x --- right partition is a single partition and
execute many times
for batch in right-partition-x {
join(batch, left-data)
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]