liukun4515 commented on PR #5087:
URL: 
https://github.com/apache/arrow-datafusion/pull/5087#issuecomment-1407273778

   > # Which issue does this PR close?
   > Closes #5022.
   > 
   > # Rationale for this change
   > 1. Currently the required distribution of `NestedLoopJoinExec` is not 
consistent with `CrossJoinExec`, and some optimizations(`JoinSelection`) 
already done for `CrossJoinExec` will not take effect for `NestedLoopJoinExec`.
   > 2. `NestedLoopJoinExec` will collect both left and right to a single 
partition for `Full` join, this may cause performance issue.
   > 
   > We can always collect left for `NestedLoopJoinExec` like `CrossJoinExec` 
does. This will also fix #5022.
   > 
   
   I think it will bring other performance issue for other `join type` when we 
always collect the left the NLJ.
   
   I think the current required distribution for the NLJ will cause performance 
issue, but it will cause error or panic for the implementation.
   
   We should take more time to find the root cause for the panic issue 
https://github.com/apache/arrow-datafusion/issues/5022
   > # What changes are included in this PR?
   > 1. Always collect left side for `NestedLoopJoinExec` like `CrossJoinExec` 
does.
   > 2. For `LEFT/LEFT-SEMI/LEFT-ANTI/FULL`, we need generate the additional 
data for the unmatched rows, so maintain a global state of `visited_left_side` 
in `NestedLoopJoinExec`.
   > 3. Merge partition state - `visited_left_side` to global after one 
partition is finished.
   > 4. Generate the additional data for the unmatched rows based on `global 
state` after all partitions are finished.
   > 
   > # Are these changes tested?
   > Yes.
   > 
   > # Are there any user-facing changes?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to