kosiew opened a new issue, #22665:
URL: https://github.com/apache/datafusion/issues/22665
## Summary
Aggregate FILTER row-validity handling is currently fragmented across helper
and grouped accumulator paths. We should centralize the invariant that a row
passes aggregate FILTER if and only if the predicate is Some(true).
## Current State
- `filter_to_validity` was introduced and grouped-accumulator helper paths
were updated (including `accumulate_multiple` and `filter_to_nulls` plumbing)
in #22068
- `first_last.rs` still uses local filter checks in grouped paths.
- `accumulate_indices` still repeats 64-bit validity iteration logic across
multiple branches.
## Problem
Duplicated nullable-filter logic increases the risk of semantic drift and
regressions (especially around NULL predicate handling), and makes validity
bitmap code harder to maintain.
Because grouped and non-grouped paths can evolve independently, this drift
can become user-visible as incorrect aggregate results under nullable
predicates.
## Proposal
1. Add one shared helper/iterator in `functions-aggregate-common` that
encodes:
- row passes FILTER iff predicate is `Some(true)`
2. Migrate grouped `first_value` / `last_value` filter paths to this shared
utility.
3. Refactor `accumulate_indices` to reuse shared validity iteration and
remove duplicated loops.
4. Migrate remaining grouped accumulators incrementally.
## Scope
In scope:
- Shared FILTER validity/predicate utility.
- Targeted migration of `first_last` grouped paths.
- Targeted cleanup in `accumulate_indices`.
- Focused test coverage for nullable FILTER predicates.
Out of scope:
- Broad performance rewrites not tied to FILTER semantics.
- Large cross-workspace refactors.
## Acceptance Criteria
1. A single reusable utility encodes FILTER pass criteria as `Some(true)`.
2. Grouped `first_value` / `last_value` no longer use bespoke nullable
filter checks.
3. `accumulate_indices` no longer carries duplicated validity loop logic
that can drift semantically.
4. Existing aggregate FILTER tests pass.
5. New tests cover nullable FILTER predicates for grouped first/last
behavior.
6. No regression for non-null filter predicates.
## Testing
- Add SQLLogicTests for grouped `first_value` and `last_value` with nullable
FILTER predicates.
- Include cases where value bits may be true while validity is NULL.
- Validate result is NULL when no rows satisfy `Some(true)`.
- Run relevant aggregate crate tests and sqllogictest aggregate coverage.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]