comphead commented on code in PR #10784:
URL: https://github.com/apache/datafusion/pull/10784#discussion_r1629770348


##########
datafusion/physical-plan/src/joins/sort_merge_join.rs:
##########
@@ -1500,11 +1497,13 @@ fn get_filtered_join_mask(
                     filter_matched_indices.push(streamed_indices.value(i));
                 }
 
-                // if switched to next streaming index(e.g. from 0 to 1, or 
from 1 to 2), we reset seen_as_true flag
+                // Reset `seen_as_true` flag and calculate mask for the 
current streaming index
+                // - if within the batch it switched to next streaming 
index(e.g. from 0 to 1, or from 1 to 2)
+                // - if it is at the end of the all buffered batches for the 
given streaming index, 0 index comes last
                 if (i < streamed_indices_length - 1
                     && streamed_indices.value(i) != streamed_indices.value(i + 
1))
                     || (i == streamed_indices_length - 1
-                        && *scanning_buffered_batch_idx == 
buffered_batches_len - 1)
+                        && *scanning_buffered_offset == 0)

Review Comment:
   Thanks @viirya for the review. We exactly need to check the condition where 
all buffered batches are scanned for current streamed batch. This is because 
LeftAnti doesnt know if it matches or not until the very last buffered row 
comes in. This scenario already tested in slt file
   ```
   query II
   select * from (
   with
   t1 as (
       select 11 a, 12 b),
   t2 as (
       select 11 a, 12 c union all
       select 11 a, 11 c union all
       select 11 a, 15 c
       )
   select t1.* from t1 where not exists (select 1 from t2 where t2.a = t1.a and 
t1.b > t2.c)
   ) order by 1, 2
   ----
   ```
   
   it works for small and large batches.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to