alamb commented on code in PR #18766:
URL: https://github.com/apache/datafusion/pull/18766#discussion_r2586402844


##########
datafusion/physical-plan/src/aggregates/row_hash.rs:
##########
@@ -194,7 +194,9 @@ impl SkipAggregationProbe {
         if self.input_rows >= self.probe_rows_threshold {
             self.should_skip = self.num_groups as f64 / self.input_rows as f64
                 >= self.probe_ratio_threshold;
-            self.is_locked = true;
+            // Set is_locked to true only if we have decided to skip, 
otherwise we can try to skip
+            // during processing the next record_batch.
+            self.is_locked = self.should_skip;

Review Comment:
   Another way to write this that might make it clearer what the valid states 
are could be
   
   ```rust
               if self.num_groups as f64 / self.input_rows as f64 >= 
self.probe_ratio_threshold {
                 self.should_skip = true;
                 self.is_locked = true;
               }
   ```
   
   We could also potentially make it a tri state enum rather than two flags
   ```rust 
   enum ProbeState {
     // actively probing, not skipping, checking threshold
     Active,
     // Locked, not skipping, no longer checking threshold
     Locked, 
     // Skipping initial phase, no longer checking threshold
     Skipping
   }
   ```



##########
datafusion/physical-plan/src/aggregates/row_hash.rs:
##########
@@ -1395,4 +1398,150 @@ mod tests {
 
         Ok(())
     }
+
+    #[tokio::test]
+    async fn test_skip_aggregation_probe_not_locked_until_skip() -> Result<()> 
{

Review Comment:
   I verified that this test covers the change by removing the code and then 
this test fails like
   
   ```
   thread 
'aggregates::row_hash::tests::test_skip_aggregation_probe_not_locked_until_skip'
 (5189676) panicked at 
datafusion/physical-plan/src/aggregates/row_hash.rs:1538:9:
   assertion `left == right` failed: Expected batch 3's rows (100) to be skipped
     left: 0
    right: 100
   stack backtrace:
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to