Re: [PR] feat: Use PartialSortExec when input data is sorted on prefix columns [datafusion]

via GitHub Sat, 26 Jul 2025 04:54:07 -0700


berkaysynnada commented on code in PR #16905:
URL: https://github.com/apache/datafusion/pull/16905#discussion_r2232864563



##########
datafusion/core/benches/partial_sort_benchmark.rs:
##########
@@ -0,0 +1,239 @@
+use criterion::{black_box, criterion_group, criterion_main, Criterion};
+use datafusion::arrow::array::Int32Array;
+use datafusion::arrow::datatypes::{DataType, Field, Schema};
+use datafusion::arrow::record_batch::RecordBatch;
+use datafusion::datasource::MemTable;
+use datafusion::logical_expr::{col, SortExpr};
+use datafusion::prelude::*;
+use datafusion_common::Result;
+use std::sync::Arc;
+use tokio::runtime::Runtime;
+
+fn create_presorted_data(rows: usize, groups: usize) -> Result<RecordBatch> {

Review Comment:
   can you share these benchmark results in the PR body, before and after the 
change?
   
   I think we need more comprehensive analysis here to apply this change, such 
as total row counts, batch sizes, number of distinct prefix values, having a 
fetch value, cardinality of sort columns, parallelism etc. If you have time, 
investigating these would be very helpful to make the right call



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: Use PartialSortExec when input data is sorted on prefix columns [datafusion]

Reply via email to