korowa commented on PR #12082:
URL: https://github.com/apache/datafusion/pull/12082#issuecomment-2336680062

   @comphead  I've finally got it -- it's like in this case SMJ is trying to 
produce output for each join key pair (streamed-buffered) -- I guess it's how 
smj state managements works now -- streamed-side index won't move, until all 
buffered-side data will be processed, since it's required to identify current 
ordering.
   
   ```
   - get current join key from streamed_batch.join_arrays by 
self.streamed_batch.idx
   - find all batches in buffered_data that contain the join key from step 1
   - if the buffered_data.scanning_batch_idx equals to batches length from 
step2 and this batch range.end == num_rows that probably means SMJ already 
emitted all the indices from this batch and we are done for the some particular 
key
   ```
   I'd say that normally you don't need to compare join keys, and you should 
rely on `buffered_data.scanning_finished()` (or `self.current_ordering == 
Less`), but in your example both of these conditions are either not working, or 
not intended to work (not sure which of these two is a correct statement).
   
   I also hope to start spending some time on SMJ due to 
https://github.com/apache/datafusion/issues/12359


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to