neilconway commented on code in PR #20496:
URL: https://github.com/apache/datafusion/pull/20496#discussion_r2842335322
##########
datafusion/core/benches/aggregate_query_sql.rs:
##########
@@ -251,6 +251,39 @@ fn criterion_benchmark(c: &mut Criterion) {
)
})
});
+
+ c.bench_function("array_agg_query_group_by_few_groups", |b| {
Review Comment:
In the query, it is the # of groups; of course for such a simple query the #
of groups follows directly from the distribution of the data. `wide`, `mid`,
`narrow` are respectively:
```rust
// Integers randomly selected from a wide range of values, i.e. [0,
// u64::MAX], such that there are ~no repeated values.
Field::new("u64_wide", DataType::UInt64, false),
// Integers randomly selected from a mid-range of values [0, 1000),
// providing ~1000 distinct groups.
Field::new("u64_mid", DataType::UInt64, false),
// Integers randomly selected from a narrow range of values such that
// there are a few distinct values, but they are repeated often.
Field::new("u64_narrow", DataType::UInt64, false),
```
`wide` and `narrow` existed already, I just wanted a workload somewhere in
the middle when profiling `array_agg()`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]