Dandandan commented on code in PR #21182:
URL: https://github.com/apache/datafusion/pull/21182#discussion_r3036542606
##########
datafusion/physical-plan/src/sorts/sort_preserving_merge.rs:
##########
@@ -366,7 +366,7 @@ impl ExecutionPlan for SortPreservingMergeExec {
.map(|partition| {
let stream =
self.input.execute(partition,
Arc::clone(&context))?;
- Ok(spawn_buffered(stream, 1))
+ Ok(spawn_buffered(stream, 16))
Review Comment:
Sounds good - i think it's reasonable to do some buffering here (as we
effectively lose it from the unbounded buffering in SortExec.
Two points:
* We should call this buffering instead of prefetching (it's some IO of
course, but mostly it's about preparing the whole record batch stream from the
inner plan)
* Can we use the `BufferExec` for this?
Also slightly looking forward: I think we could benefit from parallel merge
(e.g. finding some split in the n streams and merging in parallel) in these
situations where sorting becomes mostly about merging.
##########
datafusion/physical-plan/src/sorts/sort_preserving_merge.rs:
##########
@@ -366,7 +366,7 @@ impl ExecutionPlan for SortPreservingMergeExec {
.map(|partition| {
let stream =
self.input.execute(partition,
Arc::clone(&context))?;
- Ok(spawn_buffered(stream, 1))
+ Ok(spawn_buffered(stream, 16))
Review Comment:
Sounds good - i think it's reasonable to do some buffering here (as we
effectively lose it from the unbounded buffering in SortExec).
Two points:
* We should call this buffering instead of prefetching (it's some IO of
course, but mostly it's about preparing the whole record batch stream from the
inner plan)
* Can we use the `BufferExec` for this?
Also slightly looking forward: I think we could benefit from parallel merge
(e.g. finding some split in the n streams and merging in parallel) in these
situations where sorting becomes mostly about merging.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]