zhuqi-lucas commented on code in PR #21182:
URL: https://github.com/apache/datafusion/pull/21182#discussion_r3036520356
##########
datafusion/physical-plan/src/sorts/sort_preserving_merge.rs:
##########
@@ -366,7 +366,7 @@ impl ExecutionPlan for SortPreservingMergeExec {
.map(|partition| {
let stream =
self.input.execute(partition,
Arc::clone(&context))?;
- Ok(spawn_buffered(stream, 1))
+ Ok(spawn_buffered(stream, 16))
Review Comment:
Actually the current benchmarks reflect the Exact path gains (sort
elimination → scan limit for LIMIT queries). I haven't benchmarked the Inexact
path separately yet besides the previous reverse inexact path.
For the Inexact path, the performance gain would come from TopK + file
reordering + dynamic filter pruning subsequent files — but TopK still needs to
fully read the first file, even with filter pushdown we need the filter column
to construct the row selection (vs Exact where scan limit reads
only N rows and stops), especially when the file is huge and only one file.
So the wins would be real but smaller than what the current benchmarks show.
I'll split the PR and add Inexact-specific benchmarks to validate. Will keep
the Exact path + prefetch changes for a follow-up PR.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]