viirya commented on code in PR #10784:
URL: https://github.com/apache/datafusion/pull/10784#discussion_r1630110479


##########
datafusion/physical-plan/src/joins/sort_merge_join.rs:
##########
@@ -1500,11 +1497,13 @@ fn get_filtered_join_mask(
                     filter_matched_indices.push(streamed_indices.value(i));
                 }
 
-                // if switched to next streaming index(e.g. from 0 to 1, or 
from 1 to 2), we reset seen_as_true flag
+                // Reset `seen_as_true` flag and calculate mask for the 
current streaming index
+                // - if within the batch it switched to next streaming 
index(e.g. from 0 to 1, or from 1 to 2)
+                // - if it is at the end of the all buffered batches for the 
given streaming index, 0 index comes last
                 if (i < streamed_indices_length - 1
                     && streamed_indices.value(i) != streamed_indices.value(i + 
1))
                     || (i == streamed_indices_length - 1
-                        && *scanning_buffered_batch_idx == 
buffered_batches_len - 1)
+                        && *scanning_buffered_offset == 0)

Review Comment:
   > We exactly need to check the condition where all buffered batches are 
scanned for current streamed batch.
   
   This is correct, but what I meant is that `scanning_buffered_offset == 0` 
condition could also be true when it moves to next buffered batch and output 
size is equal to batch size. When it happens, SMJ also goes to output batches, 
but obviously not all buffered batches are scanned for current row in the 
streamed batch.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to