kosiew opened a new pull request, #22639:
URL: https://github.com/apache/datafusion/pull/22639

   
   
   ## Which issue does this PR close?
   
   * Closes #22638.
   
   ## Rationale for this change
   
   Spark AVG and the built-in AVG implementations maintained separate logic for 
converting input values into group-accumulator state, including null and filter 
handling. This duplication increased the risk of behavioral divergence and made 
maintenance more difficult.
   
   This change introduces a shared helper for AVG state conversion and aligns 
null/filter semantics across both implementations. It also ensures that Spark 
AVG applies filters consistently during state merges, reducing the chance of 
discrepancies between the two implementations.
   
   ## What changes are included in this PR?
   
   * Added a shared AVG state conversion helper in `functions-aggregate-common`:
   
     * `groups_accumulator::avg::convert_to_avg_state`
   * Refactored the built-in AVG implementation to use the shared helper 
instead of maintaining local null-mask and state-conversion logic.
   * Refactored the Spark AVG implementation to use the same shared helper for 
`convert_to_state`.
   * Updated Spark AVG `merge_batch` to skip rows that are:
   
     * null in the partial count state,
     * null in the partial sum state,
     * filtered out by a merge filter (including `NULL` filter values).
   * Preserved implementation-specific state ordering:
   
     * Built-in AVG: `[count, sum]`
     * Spark AVG: `[sum, count]`
   * Added regression tests covering:
   
     * propagation of input nulls into AVG state,
     * propagation of filter nulls into AVG state,
     * preservation of sum data types,
     * state ordering expectations,
     * Spark AVG merge behavior with filters.
   
   ## Are these changes tested?
   
   Yes.
   
   New unit tests were added for the shared helper and both AVG implementations:
   
   ### `functions-aggregate-common`
   
   * `convert_to_avg_state_applies_input_nulls_to_sum_and_count`
   * `convert_to_avg_state_applies_filter_nulls_to_sum_and_count`
   * `convert_to_avg_state_preserves_sum_data_type`
   
   ### Built-in AVG (`functions-aggregate`)
   
   * `float64_convert_to_state_uses_count_sum_order_and_null_filter`
   * `decimal_convert_to_state_preserves_sum_type_and_nulls`
   * `duration_convert_to_state_preserves_sum_type_and_applies_filter`
   
   ### Spark AVG
   
   * `convert_to_state_with_null_filter`
   * `merge_batch_applies_filter`
   
   Existing AVG tests continue to validate state round-tripping and related 
behavior. 
   
   ## Are there any user-facing changes?
   
   No user-facing changes are intended. This PR refactors and aligns internal 
AVG group-accumulator state handling and adds regression coverage for null and 
filter semantics.
   
   ## LLM-generated code disclosure
   
   This PR includes LLM-generated code and comments. All LLM-generated content 
has been manually reviewed and tested.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to