comphead commented on PR #12082: URL: https://github.com/apache/datafusion/pull/12082#issuecomment-2325221509
> At what point in the code you are able to observe `0..1` for the key 2? I'm running the test from https://github.com/apache/datafusion/pull/12082#issuecomment-2319361185 and debugging the `freeze_streamed` function. For batch size 2 I'm seeing batches distribution like https://github.com/apache/datafusion/pull/12082#issuecomment-2319492383 You can see there that buffered batch with join array ``` join_arrays: [ PrimitiveArray<Int32> [ 2, 2, ], PrimitiveArray<Int32> [ 20, 21, ], ], ``` which confuses me, I was thinking only buffered batches that contains a streaming key should be there. But looks like its not. I believe we can get do following: - get current join key from `streamed_batch.join_arrays` by `self.streamed_batch.idx` - find all batches in `buffered_data` that contain the join key from step 1 - if the `buffered_data.scanning_batch_idx` equals to batches length from step2 and this batch `range.end == num_rows` that probably means SMJ already emitted all the indices from this batch and we are done for the some particular key -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org