alamb commented on code in PR #12302: URL: https://github.com/apache/datafusion/pull/12302#discussion_r1743625178
########## datafusion/physical-plan/src/sorts/merge.rs: ########## @@ -156,12 +164,22 @@ impl<C: CursorValues> SortPreservingMergeStream<C> { } // try to initialize the loser tree if self.loser_tree.is_empty() { - // Ensure all non-exhausted streams have a cursor from which - // rows can be pulled - for i in 0..self.streams.partitions() { - if let Err(e) = ready!(self.maybe_poll_stream(cx, i)) { - self.aborted = true; - return Poll::Ready(Some(Err(e))); + // Ensure all non-exhausted streams have a cursor from which rows can be pulled + let remaining_partitions = self.uninitiated_partitions.clone(); + for i in remaining_partitions { + match self.maybe_poll_stream(cx, i) { + Poll::Ready(Err(e)) => { + self.aborted = true; + return Poll::Ready(Some(Err(e))); + } + Poll::Pending => { + self.uninitiated_partitions.rotate_left(1); + cx.waker().wake_by_ref(); Review Comment: I did some research -- see https://github.com/synnada-ai/datafusion-upstream/pull/34/files#r1743621057 I think calling `wake_by_ref` effectively tells tokio to schedule this poll loop again after handling other tasks, which makes sense to me (as I am not sure how else we would signal to tokio that the merge is ready to go) But I share your concern that this will cause some sort of performance issue ########## datafusion/physical-plan/src/sorts/merge.rs: ########## @@ -156,12 +164,22 @@ impl<C: CursorValues> SortPreservingMergeStream<C> { } // try to initialize the loser tree if self.loser_tree.is_empty() { - // Ensure all non-exhausted streams have a cursor from which - // rows can be pulled - for i in 0..self.streams.partitions() { - if let Err(e) = ready!(self.maybe_poll_stream(cx, i)) { Review Comment: I coded up what I had in mind here: https://github.com/synnada-ai/datafusion-upstream/pull/34 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org