alamb commented on code in PR #18766:
URL: https://github.com/apache/datafusion/pull/18766#discussion_r2586402844
##########
datafusion/physical-plan/src/aggregates/row_hash.rs:
##########
@@ -194,7 +194,9 @@ impl SkipAggregationProbe {
if self.input_rows >= self.probe_rows_threshold {
self.should_skip = self.num_groups as f64 / self.input_rows as f64
>= self.probe_ratio_threshold;
- self.is_locked = true;
+ // Set is_locked to true only if we have decided to skip,
otherwise we can try to skip
+ // during processing the next record_batch.
+ self.is_locked = self.should_skip;
Review Comment:
Another way to write this that might make it clearer what the valid states
are could be
```rust
if self.num_groups as f64 / self.input_rows as f64 >=
self.probe_ratio_threshold {
self.should_skip = true;
self.is_locked = true;
}
```
We could also potentially make it a tri state enum rather than two flags
```rust
enum ProbeState {
// actively probing, not skipping, checking threshold
Active,
// Locked, not skipping, no longer checking threshold
Locked,
// Skipping initial phase, no longer checking threshold
Skipping
}
```
##########
datafusion/physical-plan/src/aggregates/row_hash.rs:
##########
@@ -1395,4 +1398,150 @@ mod tests {
Ok(())
}
+
+ #[tokio::test]
+ async fn test_skip_aggregation_probe_not_locked_until_skip() -> Result<()>
{
Review Comment:
I verified that this test covers the change by removing the code and then
this test fails like
```
thread
'aggregates::row_hash::tests::test_skip_aggregation_probe_not_locked_until_skip'
(5189676) panicked at
datafusion/physical-plan/src/aggregates/row_hash.rs:1538:9:
assertion `left == right` failed: Expected batch 3's rows (100) to be skipped
left: 0
right: 100
stack backtrace:
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]