Re: [PR] chore: Add end-to-end benchmark for array_agg, code cleanup [datafusion]

via GitHub Mon, 23 Feb 2026 10:12:04 -0800


neilconway commented on code in PR #20496:
URL: https://github.com/apache/datafusion/pull/20496#discussion_r2842335322



##########
datafusion/core/benches/aggregate_query_sql.rs:
##########
@@ -251,6 +251,39 @@ fn criterion_benchmark(c: &mut Criterion) {
             )
         })
     });
+
+    c.bench_function("array_agg_query_group_by_few_groups", |b| {

Review Comment:
   In the query, it is the # of groups; of course for such a simple query the # 
of groups follows directly from the distribution of the data. `wide`, `mid`, 
`narrow` are respectively:
   
   ```rust
           // Integers randomly selected from a wide range of values, i.e. [0,
           // u64::MAX], such that there are ~no repeated values.
           Field::new("u64_wide", DataType::UInt64, false),
           // Integers randomly selected from a mid-range of values [0, 1000),
           // providing ~1000 distinct groups.
           Field::new("u64_mid", DataType::UInt64, false),
           // Integers randomly selected from a narrow range of values such that
           // there are a few distinct values, but they are repeated often.
           Field::new("u64_narrow", DataType::UInt64, false),
     ```
     
     `wide` and `narrow` existed already, I just wanted a workload somewhere in 
the middle when profiling `array_agg()`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] chore: Add end-to-end benchmark for array_agg, code cleanup [datafusion]

Reply via email to