jorgecarleitao commented on a change in pull request #8503: URL: https://github.com/apache/arrow/pull/8503#discussion_r510134213
########## File path: rust/datafusion/src/physical_plan/merge.rs ########## @@ -103,37 +105,56 @@ impl ExecutionPlan for MergeExec { self.input.execute(0).await } _ => { - let tasks = (0..input_partitions).map(|part_i| { + let (sender, receiver) = mpsc::unbounded::<ArrowResult<RecordBatch>>(); Review comment: Good point. I actually started with a `bounded(1)`, but this (and other values) won't work because we need to `join_all` threads. because there is no consumer to retrieve the items from the `receiver`, we are locked as the threads cannot be joined, and we wait indefinitely for the `join_all`. An alternative is to not `join_all`, but then we risk losing results if the main thread finishes first. I.e. if we bound the channel, we cannot `join_all` and thus we may lose threads. If we join all, the channel needs to be unbounded so that we can build the stream. IMO neither is good, as in both cases we are essentially waiting for all threads to finish before returning the stream, which I understand is not what we want. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org