kosiew opened a new pull request, #22639:
URL: https://github.com/apache/datafusion/pull/22639
## Which issue does this PR close?
* Closes #22638.
## Rationale for this change
Spark AVG and the built-in AVG implementations maintained separate logic for
converting input values into group-accumulator state, including null and filter
handling. This duplication increased the risk of behavioral divergence and made
maintenance more difficult.
This change introduces a shared helper for AVG state conversion and aligns
null/filter semantics across both implementations. It also ensures that Spark
AVG applies filters consistently during state merges, reducing the chance of
discrepancies between the two implementations.
## What changes are included in this PR?
* Added a shared AVG state conversion helper in `functions-aggregate-common`:
* `groups_accumulator::avg::convert_to_avg_state`
* Refactored the built-in AVG implementation to use the shared helper
instead of maintaining local null-mask and state-conversion logic.
* Refactored the Spark AVG implementation to use the same shared helper for
`convert_to_state`.
* Updated Spark AVG `merge_batch` to skip rows that are:
* null in the partial count state,
* null in the partial sum state,
* filtered out by a merge filter (including `NULL` filter values).
* Preserved implementation-specific state ordering:
* Built-in AVG: `[count, sum]`
* Spark AVG: `[sum, count]`
* Added regression tests covering:
* propagation of input nulls into AVG state,
* propagation of filter nulls into AVG state,
* preservation of sum data types,
* state ordering expectations,
* Spark AVG merge behavior with filters.
## Are these changes tested?
Yes.
New unit tests were added for the shared helper and both AVG implementations:
### `functions-aggregate-common`
* `convert_to_avg_state_applies_input_nulls_to_sum_and_count`
* `convert_to_avg_state_applies_filter_nulls_to_sum_and_count`
* `convert_to_avg_state_preserves_sum_data_type`
### Built-in AVG (`functions-aggregate`)
* `float64_convert_to_state_uses_count_sum_order_and_null_filter`
* `decimal_convert_to_state_preserves_sum_type_and_nulls`
* `duration_convert_to_state_preserves_sum_type_and_applies_filter`
### Spark AVG
* `convert_to_state_with_null_filter`
* `merge_batch_applies_filter`
Existing AVG tests continue to validate state round-tripping and related
behavior.
## Are there any user-facing changes?
No user-facing changes are intended. This PR refactors and aligns internal
AVG group-accumulator state handling and adds regression coverage for null and
filter semantics.
## LLM-generated code disclosure
This PR includes LLM-generated code and comments. All LLM-generated content
has been manually reviewed and tested.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]