crepererum commented on code in PR #6310:
URL: https://github.com/apache/arrow-datafusion/pull/6310#discussion_r1189155285
##########
datafusion/core/src/physical_plan/repartition/mod.rs:
##########
@@ -532,9 +541,28 @@ impl RepartitionExec {
timer.done();
}
- // If the input stream is endless, we may spin forever and never
yield back to tokio. Hence let us yield.
- // See https://github.com/apache/arrow-datafusion/issues/5278.
- tokio::task::yield_now().await;
+ // If the input stream is endless, we may spin forever and
Review Comment:
You can call it a bug or a design issue of DF / tokio. But if you run two
spawned tasks and one never returns to Tokio then the other will never run.
Unbounded buffers are NOT avoidable in the current DF design, because you
cannot predict tokio scheduling and hash outputs. So the fix here is adequate.
`consume_budget` would be the better solution but it's an unstable tokio
feature, so that's not usable.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]