viirya commented on code in PR #10784: URL: https://github.com/apache/datafusion/pull/10784#discussion_r1630110479
########## datafusion/physical-plan/src/joins/sort_merge_join.rs: ########## @@ -1500,11 +1497,13 @@ fn get_filtered_join_mask( filter_matched_indices.push(streamed_indices.value(i)); } - // if switched to next streaming index(e.g. from 0 to 1, or from 1 to 2), we reset seen_as_true flag + // Reset `seen_as_true` flag and calculate mask for the current streaming index + // - if within the batch it switched to next streaming index(e.g. from 0 to 1, or from 1 to 2) + // - if it is at the end of the all buffered batches for the given streaming index, 0 index comes last if (i < streamed_indices_length - 1 && streamed_indices.value(i) != streamed_indices.value(i + 1)) || (i == streamed_indices_length - 1 - && *scanning_buffered_batch_idx == buffered_batches_len - 1) + && *scanning_buffered_offset == 0) Review Comment: > We exactly need to check the condition where all buffered batches are scanned for current streamed batch. This is correct, but what I meant is that `scanning_buffered_offset == 0` condition could also be true when it moves to next buffered batch and output size is equal to batch size. When it happens, SMJ also goes to output batches, but obviously not all buffered batches are scanned for current row in the streamed batch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org