berkaysynnada commented on code in PR #16905: URL: https://github.com/apache/datafusion/pull/16905#discussion_r2232864563
########## datafusion/core/benches/partial_sort_benchmark.rs: ########## @@ -0,0 +1,239 @@ +use criterion::{black_box, criterion_group, criterion_main, Criterion}; +use datafusion::arrow::array::Int32Array; +use datafusion::arrow::datatypes::{DataType, Field, Schema}; +use datafusion::arrow::record_batch::RecordBatch; +use datafusion::datasource::MemTable; +use datafusion::logical_expr::{col, SortExpr}; +use datafusion::prelude::*; +use datafusion_common::Result; +use std::sync::Arc; +use tokio::runtime::Runtime; + +fn create_presorted_data(rows: usize, groups: usize) -> Result<RecordBatch> { Review Comment: can you share these benchmark results in the PR body, before and after the change? I think we need more comprehensive analysis here to apply this change, such as total row counts, batch sizes, number of distinct prefix values, having a fetch value, cardinality of sort columns, parallelism etc. If you have time, investigating these would be very helpful to make the right call -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org