neilconway opened a new pull request, #23049:
URL: https://github.com/apache/datafusion/pull/23049

   ## Which issue does this PR close?
   
   - Closes #23048
   
   ## Rationale for this change
   
   When the no-filter bitwise sort-merge join path finds a matching key,
   it advances the inner cursor past that key before marking all matching
   outer rows. If the outer key group continues into the next outer batch
   and polling that batch returns `Pending`, `poll_join` resumes from its
   top-level state with the inner cursor already past the matched key.
   
   On resume, the continued outer key can compare as Less than the current
   inner key and be treated as unmatched. This can leak rows from `LeftAnti`
   joins, drop rows from `LeftSemi` joins, or produce incorrect mark values.
   
   To fix this issue, we now remember the last key that matched the inner side.
   If a later outer row compares `Less` but still has that key, mark the 
continued
   outer key group as matched instead of skipping it.
   
   ## What changes are included in this PR?
   
   * Bug fix for SMJ semi/anti-join behavior across outer batches
   * Add unit test
   
   ## Are these changes tested?
   
   Yes.
   
   ## Are there any user-facing changes?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to