alamb commented on code in PR #12302:
URL: https://github.com/apache/datafusion/pull/12302#discussion_r1743625178


##########
datafusion/physical-plan/src/sorts/merge.rs:
##########
@@ -156,12 +164,22 @@ impl<C: CursorValues> SortPreservingMergeStream<C> {
         }
         // try to initialize the loser tree
         if self.loser_tree.is_empty() {
-            // Ensure all non-exhausted streams have a cursor from which
-            // rows can be pulled
-            for i in 0..self.streams.partitions() {
-                if let Err(e) = ready!(self.maybe_poll_stream(cx, i)) {
-                    self.aborted = true;
-                    return Poll::Ready(Some(Err(e)));
+            // Ensure all non-exhausted streams have a cursor from which rows 
can be pulled
+            let remaining_partitions = self.uninitiated_partitions.clone();
+            for i in remaining_partitions {
+                match self.maybe_poll_stream(cx, i) {
+                    Poll::Ready(Err(e)) => {
+                        self.aborted = true;
+                        return Poll::Ready(Some(Err(e)));
+                    }
+                    Poll::Pending => {
+                        self.uninitiated_partitions.rotate_left(1);
+                        cx.waker().wake_by_ref();

Review Comment:
   I did some research -- see 
https://github.com/synnada-ai/datafusion-upstream/pull/34/files#r1743621057
   
   I think calling `wake_by_ref` effectively tells tokio to schedule this poll 
loop again after handling other tasks, which makes sense to me (as I am not 
sure how else we would signal to tokio that the merge is ready to go)
   
   But I share your concern that this will cause some sort of performance issue



##########
datafusion/physical-plan/src/sorts/merge.rs:
##########
@@ -156,12 +164,22 @@ impl<C: CursorValues> SortPreservingMergeStream<C> {
         }
         // try to initialize the loser tree
         if self.loser_tree.is_empty() {
-            // Ensure all non-exhausted streams have a cursor from which
-            // rows can be pulled
-            for i in 0..self.streams.partitions() {
-                if let Err(e) = ready!(self.maybe_poll_stream(cx, i)) {

Review Comment:
   I coded up what I had in mind here: 
https://github.com/synnada-ai/datafusion-upstream/pull/34
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to