kosiew commented on code in PR #22852:
URL: https://github.com/apache/datafusion/pull/22852#discussion_r3385356475
##########
datafusion/physical-plan/src/topk/mod.rs:
##########
@@ -1196,6 +1202,100 @@ mod tests {
Ok(())
}
+ /// Regression test for #22849: a batch whose rows are entirely rejected
by the
+ /// heap's dynamic filter must still trigger `attempt_early_completion`
when its
+ /// last row's prefix is worse than the heap's worst.
+ ///
+ /// Before the fix, the `!filter.has_true()` short-circuit returned
without calling
+ /// `attempt_early_completion`. Because the heap's filter is itself
derived from the
+ /// heap's worst row, a batch from a strictly-worse prefix is exactly the
case the
+ /// filter rejects entirely — i.e. the very signal the early-exit was
designed to
+ /// detect was being silently dropped.
+ #[tokio::test]
+ async fn test_try_finish_fires_when_filter_rejects_entire_batch() ->
Result<()> {
Review Comment:
Nice regression coverage. One small maintainability thought: this repeats a
lot of the setup from `test_try_finish_marks_finished_with_prefix`. A small
helper for the `(a, b)` schema, sort expressions, and `TopK` construction could
make future TopK prefix tests shorter and keep the main scenario easier to spot.
##########
datafusion/physical-plan/src/topk/mod.rs:
##########
@@ -255,7 +255,13 @@ impl TopK {
let array = filtered.into_array(num_rows)?;
let mut filter = array.as_boolean().clone();
if !filter.has_true() {
Review Comment:
This comment is accurate, but it feels a bit longer than the control flow it
explains. I think we could keep just the key invariant here: the heap is
unchanged, but the rejected batch can still prove prefix completion.
For example:
```rust
// The heap is unchanged, but a fully rejected batch can still prove that
// the shared sort prefix has passed the heap boundary.
self.attempt_early_completion(&batch)?;
return Ok(());
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]