Re: [PR] feat: sort file groups by statistics during sort pushdown (Sort pushdown phase 2) [datafusion]

via GitHub Sun, 05 Apr 2026 00:44:55 -0700


zhuqi-lucas commented on code in PR #21182:
URL: https://github.com/apache/datafusion/pull/21182#discussion_r3036520356



##########
datafusion/physical-plan/src/sorts/sort_preserving_merge.rs:
##########
@@ -366,7 +366,7 @@ impl ExecutionPlan for SortPreservingMergeExec {
                     .map(|partition| {
                         let stream =
                             self.input.execute(partition, 
Arc::clone(&context))?;
-                        Ok(spawn_buffered(stream, 1))
+                        Ok(spawn_buffered(stream, 16))

Review Comment:
    Actually the current benchmarks reflect the Exact path gains (sort 
elimination → scan limit for LIMIT queries). I haven't benchmarked the Inexact 
path separately yet besides the previous reverse inexact path.
   
     For the Inexact path, the performance gain would come from TopK + file 
reordering + dynamic filter pruning subsequent files. For multi-file cases this 
works well — after reading the first file, dynamic filter
     can skip remaining files entirely via file-level early termination. 
However, for single large files, row group selection is done upfront before any 
data is read (dynamic filter is still empty at that point), so
      all row groups are selected and TopK still needs to read through them. So 
the wins would be real but smaller than what the current benchmarks show, 
especially for single-file cases.
   
    I'll split the PR and add Inexact-specific benchmarks to validate. Will 
keep the Exact path + prefetch changes for a follow-up PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: sort file groups by statistics during sort pushdown (Sort pushdown phase 2) [datafusion]

Reply via email to