Kevin-Li-2025 opened a new pull request, #23043: URL: https://github.com/apache/datafusion/pull/23043
## Which issue does this PR close? - Closes #22799. ## Rationale for this change `any_value` is a common aggregate in SQL engines for queries that need one representative non-null value from each group without imposing an ordering requirement. DataFusion currently has `first_value`, but that aggregate is order-sensitive, so exposing `any_value` gives users the intended arbitrary-value semantics directly. ## What changes are included in this PR? - Adds an `any_value(expression)` aggregate UDF and registers it with the default aggregate functions. - Reuses the existing trivial first-value accumulator with nulls ignored, so evaluation short-circuits after the first non-null value. - Marks the aggregate as order-insensitive and preserves the input field metadata/type in the return field. - Adds sqllogictest coverage for scalar, grouped, all-null, empty-input, and string return-type cases. ## Are these changes tested? Yes. I ran: ``` cargo fmt --all cargo test -p datafusion-functions-aggregate cargo test -p datafusion-sqllogictest --test sqllogictests -- aggregate_any_value.slt cargo clippy --all-targets --all-features -- -D warnings ``` ## Are there any user-facing changes? Yes. This adds a new SQL aggregate function, `any_value`. I used AI assistance to help inspect the codebase and run validation, and I reviewed the resulting implementation and tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
