adriangb commented on code in PR #21182:
URL: https://github.com/apache/datafusion/pull/21182#discussion_r3035754360
##########
datafusion/physical-plan/src/sorts/sort_preserving_merge.rs:
##########
@@ -366,7 +366,7 @@ impl ExecutionPlan for SortPreservingMergeExec {
.map(|partition| {
let stream =
self.input.execute(partition,
Arc::clone(&context))?;
- Ok(spawn_buffered(stream, 1))
+ Ok(spawn_buffered(stream, 16))
Review Comment:
> the big win for the Inexact path would be: statistics-based file
reordering + TopK + dynamic filter pushdown, where TopK reads the first
file, sets a tight threshold, and then skips subsequent files entirely via
row group pruning?
Yes I think so. And I believe that's what the benchmarks reflect too, right?
If that's the case then yeah I think I'd prefer to merge most of this great
work w/ only that branch enabled.
We could also merge the whole thing but put the Exact branch behind a
`reorder_sorted_scans` config or something and we can remove it if we figure
out how to address the perf gap.
But I'd say splitting it is probably the easiest path forward, the `Inexact`
path looks solid on it's own.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]