alamb commented on a change in pull request #8503:
URL: https://github.com/apache/arrow/pull/8503#discussion_r512668107



##########
File path: rust/datafusion/src/physical_plan/merge.rs
##########
@@ -103,37 +105,56 @@ impl ExecutionPlan for MergeExec {
                 self.input.execute(0).await
             }
             _ => {
-                let tasks = (0..input_partitions).map(|part_i| {
+                let (sender, receiver) = 
mpsc::unbounded::<ArrowResult<RecordBatch>>();

Review comment:
       I am fairly convinced that the execution is not happening 
`async`hronously -- in particular, I think there is something about this code 
(or perhaps how some of the other collection functions) that means all data 
must be available *before* the stream future itself is returned. 
   
   When I was playing with it, if `poll_next` ever returns `Pending` then 
`poll_next` is never called again. I think the adapter you outline above is a 
good idea; I think it also needs to arrange for `cx.waker()` to be woken if 
anything is added to the underlying stream.
   
   I have a bunch of ideas in my head, but I have not had time to chase them 
down yet, sadly :(




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to