berkaysynnada commented on code in PR #16905:
URL: https://github.com/apache/datafusion/pull/16905#discussion_r2232864563


##########
datafusion/core/benches/partial_sort_benchmark.rs:
##########
@@ -0,0 +1,239 @@
+use criterion::{black_box, criterion_group, criterion_main, Criterion};
+use datafusion::arrow::array::Int32Array;
+use datafusion::arrow::datatypes::{DataType, Field, Schema};
+use datafusion::arrow::record_batch::RecordBatch;
+use datafusion::datasource::MemTable;
+use datafusion::logical_expr::{col, SortExpr};
+use datafusion::prelude::*;
+use datafusion_common::Result;
+use std::sync::Arc;
+use tokio::runtime::Runtime;
+
+fn create_presorted_data(rows: usize, groups: usize) -> Result<RecordBatch> {

Review Comment:
   can you share these benchmark results in the PR body, before and after the 
change?
   
   I think we need more comprehensive analysis here to apply this change, such 
as total row counts, batch sizes, number of distinct prefix values, having a 
fetch value, cardinality of sort columns, parallelism etc. If you have time, 
investigating these would be very helpful to make the right call



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to