comphead commented on PR #12082: URL: https://github.com/apache/datafusion/pull/12082#issuecomment-2318961103
> > if there is a left streamed row with join key (1) from the right side we gonna have joined buffered batches where range shows what indices share the same join key. > > For example > > ``` > > Streamed data Buffered data > > [1] -> [0, 1, 1], [1, 1, 2] > > ``` > > > > > > > > > > > > > > > > > > > > > > > > Should have ranges `[1..3], [0..2]` > > I don't get the question clearly. > > You have `[0, 1, 1]` as buffered indices for same streamed row? Why you have same buffered row id `1` twice? Thanks @viirya it's not indices, it is a raw data. Let me rephrase it. If I have a left table | a | b | | ---- | ---- | | 10 | 20 | and right table | a | b | | ---- | ---- | | 5 | 20 | | 10 | 20 | | 10 | 21 | | 10 | 21 | | 10 | 22 | | 15 | 22 | And join key is A and Filter is on column B In `freeze_streamed` I can observe the right table comes as 3 batches 1 Batch. join_array [10] Range 1..3 - which is correct as rownumbers 1 and 2 related to join key 10 2 Batch. join_array[10] Range 0..2 - which is correct as rownumbers 0 and 1 related to join key 10 3 Batch. join_array[15] Range 0..1 - which is weird, why this batch associated ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org