comphead commented on PR #12082:
URL: https://github.com/apache/datafusion/pull/12082#issuecomment-2325221509

   > At what point in the code you are able to observe `0..1` for the key 2?
   
   I'm running the test from 
https://github.com/apache/datafusion/pull/12082#issuecomment-2319361185 and 
debugging the `freeze_streamed` function. For batch size 2 I'm seeing batches 
distribution like 
https://github.com/apache/datafusion/pull/12082#issuecomment-2319492383
   
   You can see there that buffered batch with join array 
   ```
           join_arrays: [
               PrimitiveArray<Int32>
               [
                 2,
                 2,
               ],
               PrimitiveArray<Int32>
               [
                 20,
                 21,
               ],
           ],
   ```
   which confuses me, I was thinking only buffered batches that contains a 
streaming key should be there. But looks like its not.
   
   I believe we can get do following:
   - get current join key from `streamed_batch.join_arrays` by 
`self.streamed_batch.idx`
   - find all batches in `buffered_data` that contain the join key from step 1
   - if the `buffered_data.scanning_batch_idx` equals to batches length from 
step2 and this batch `range.end == num_rows` that probably means SMJ already 
emitted all the indices from this batch and we are done for the some particular 
key


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to