zhuqi-lucas commented on code in PR #21182:
URL: https://github.com/apache/datafusion/pull/21182#discussion_r3036520356
##########
datafusion/physical-plan/src/sorts/sort_preserving_merge.rs:
##########
@@ -366,7 +366,7 @@ impl ExecutionPlan for SortPreservingMergeExec {
.map(|partition| {
let stream =
self.input.execute(partition,
Arc::clone(&context))?;
- Ok(spawn_buffered(stream, 1))
+ Ok(spawn_buffered(stream, 16))
Review Comment:
Actually the current benchmarks reflect the Exact path gains (sort
elimination → scan limit for LIMIT queries). I haven't benchmarked the Inexact
path separately yet besides the previous reverse inexact path.
For the Inexact path, the performance gain would come from TopK + file
reordering + dynamic filter pruning subsequent files. For multi-file cases this
works well — after reading the first file, dynamic filter
can skip remaining files entirely via file-level early termination.
However, for single large files, row group selection is done upfront before any
data is read (dynamic filter is still empty at that point), so
all row groups are selected and TopK still needs to read through them. So
the wins would be real but smaller than what the current benchmarks show,
especially for single-file cases.
I'll split the PR and add Inexact-specific benchmarks to validate. Will
keep the Exact path + prefetch changes for a follow-up PR.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]