neilconway opened a new pull request, #21154:
URL: https://github.com/apache/datafusion/pull/21154
## Which issue does this PR close?
- Closes #17789.
## Rationale for this change
`string_agg` previously didn't support `GroupsAccumulator`; adding support
for it can significantly improve performance, particularly when there are many
groups.
Benchmarks (M4 Max):
- string_agg_query_group_by_few_groups (~10): 645 µs → 564 µs, -11%
- string_agg_query_group_by_mid_groups (~1,000): 2,692 µs → 871 µs, -68%
- string_agg_query_group_by_many_groups (~65,000): 16,606 µs → 1,147 µs,
-93%
## What changes are included in this PR?
* Add end-to-end benchmark for `string_agg`
* Implement `GroupsAccumulator` API for `string_agg`
* Add unit tests
* Minor code cleanup for existing `string_agg` code paths
## Are these changes tested?
Yes.
## Are there any user-facing changes?
No, other than a change to an error message string.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]