jorgecarleitao commented on a change in pull request #8503:
URL: https://github.com/apache/arrow/pull/8503#discussion_r510134213



##########
File path: rust/datafusion/src/physical_plan/merge.rs
##########
@@ -103,37 +105,56 @@ impl ExecutionPlan for MergeExec {
                 self.input.execute(0).await
             }
             _ => {
-                let tasks = (0..input_partitions).map(|part_i| {
+                let (sender, receiver) = 
mpsc::unbounded::<ArrowResult<RecordBatch>>();

Review comment:
       Good point.
   
   I actually started with a `bounded(1)`, but this (and others) won't work 
because we need to `join_all` threads. because there is no consumer to retrieve 
the items from the `receiver`, we are locked as the threads cannot be joined, 
and we wait indefinitely for the `join_all`.
   
   An alternative is to not `join_all`, but then we risk losing results if the 
main thread finishes first.
   
   I.e. if we bound the channel, we cannot `join_all` and thus we may lose 
threads. If we join all, the channel needs to be unbounded so that we can build 
the stream.
   
   IMO neither is good, as in both cases we are essentially waiting for all 
threads to finish before returning the stream, which I understand is not what 
we want.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to