zhuqi-lucas commented on code in PR #21182:
URL: https://github.com/apache/datafusion/pull/21182#discussion_r3035156205


##########
datafusion/physical-plan/src/sorts/sort_preserving_merge.rs:
##########
@@ -366,7 +366,7 @@ impl ExecutionPlan for SortPreservingMergeExec {
                     .map(|partition| {
                         let stream =
                             self.input.execute(partition, 
Arc::clone(&context))?;
-                        Ok(spawn_buffered(stream, 1))
+                        Ok(spawn_buffered(stream, 16))

Review Comment:
    Thanks @adriangb  for the suggestion! Just to confirm I understand 
correctly — the big win for the Inexact path would be: statistics-based file 
reordering + TopK + dynamic filter pushdown, where TopK reads the first
   file, sets a tight threshold, and then skips subsequent files entirely via 
row group pruning?
   
   I've also addressed the prefetch concern in the latest push — it's now 
scoped to only the sort elimination path (added a prefetch field to 
SortPreservingMergeExec, default 1, only set to 16 when PushdownSort eliminates 
SortExec under SPM).
   
   If splitting is still preferred, I'm happy to do that. I'd need to add 
benchmarks specifically for the Inexact path (TopK + file reordering) to 
validate the performance gains. Let me know!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to