kosiew opened a new pull request, #22681:
URL: https://github.com/apache/datafusion/pull/22681
## Which issue does this PR close?
* Part of #22665
## Rationale for this change
Grouped `first_value` and `last_value` used a local
`BooleanArray::value(idx)` check when evaluating aggregate `FILTER` predicates.
This reads only the value bit and does not account for NULL validity, which can
incorrectly treat a NULL filter value with a `true` value bit as passing.
DataFusion's aggregate FILTER semantics require that only `Some(true)`
passes; both `Some(false)` and `None` should reject the row. This change reuses
the existing shared filter validity logic to ensure grouped `first_value` and
`last_value` behave consistently with other aggregate implementations.
## What changes are included in this PR?
* Made `filter_to_validity` publicly accessible from
`datafusion-functions-aggregate-common` so it can be reused by grouped
aggregate implementations.
* Updated `FirstLastGroupsAccumulator` to:
* Precompute filter validity using `filter_to_validity`.
* Evaluate FILTER predicates using the shared validity bitmap rather than
`BooleanArray::value(idx)`.
* Added targeted unit test helpers for grouped `first_value` / `last_value`
accumulators.
* Added regression tests covering nullable FILTER predicates where the value
bit is `true` but the validity bit is NULL:
* `test_first_group_acc_rejects_null_filter_with_true_value_bit`
* `test_last_group_acc_rejects_null_filter_with_true_value_bit`
* `test_first_group_acc_merge_rejects_null_filter_with_true_value_bit`
## Are these changes tested?
Yes.
This PR adds the following unit tests:
* `test_first_group_acc_rejects_null_filter_with_true_value_bit`
* `test_last_group_acc_rejects_null_filter_with_true_value_bit`
* `test_first_group_acc_merge_rejects_null_filter_with_true_value_bit`
These tests verify that rows with a NULL FILTER value do not pass even when
the underlying boolean value bit is `true`, and that the same semantics are
preserved in the merge path.
## Are there any user-facing changes?
No user-facing changes are intended.
This is a correctness fix that aligns grouped `first_value` and `last_value`
aggregate FILTER evaluation with existing DataFusion aggregate semantics for
nullable filter predicates.
## LLM-generated code disclosure
This PR includes LLM-generated code and comments. All LLM-generated content
has been manually reviewed and tested.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]