ygf11 commented on issue #5022:
URL: 
https://github.com/apache/arrow-datafusion/issues/5022#issuecomment-1407624893

   > NestedLoopJoinExec will sometimes build-left, and sometimes build-right 
now, and they are using the same logic, I think a better way is to give 
build-left a logic to handle join, and give build-right another logic. Then the 
partitions of the non-build-side can be executed only once.
   
   Let me explain more.
   
   The main difference is the way of iteration.
   For `build-left`, left has a single partition, right has multiple 
partitions. The iteration will be like:
   ```rust
   // Given a partition number - x 
   // execute right-partition-x
   for batch in right-partition-x {
      join(batch, left-data-(single partition)) 
   }
   ```
   
   For `build-right`, left has multiple partitions, right has a single 
partition. The iteration will be like:
   ```rust
   // Given a partition number - x 
   // execute left-partition-x
   for batch in left-partition-x {
      join(batch, right-data(single partition)) 
   }
   ```
   
   If we use `build-left` logic for `build-right`(issue-example), then:
   ```rust
   // Given a partition number - x 
   // Execute right-partition-x --- right partition is a single partition and 
execute many times
   for batch in right-partition-x {
      join(batch, left-data) 
   }
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to