This is an automated email from the ASF dual-hosted git repository.
alamb pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion.git
The following commit(s) were added to refs/heads/main by this push:
new 072098e957 Document guidelines for physical operator yielding (#15030)
072098e957 is described below
commit 072098e95771325c1ebb65d8cb4f1d029c2bd842
Author: Carol (Nichols || Goulding)
<[email protected]>
AuthorDate: Fri Mar 14 14:59:51 2025 -0400
Document guidelines for physical operator yielding (#15030)
* Document guidelines for physical operator yielding
To start a policy of the behavior physical operator streams should have
and drive improvements in this area to allow for timely cancellation.
Connects to #14036 and related to pull requests such as #14028.
* Remove discussion of tokio coop
* Move rationale up
* TODO ADD LINK
* Say block the CPU rather than pin the CPU
* Add a caveat to use the right tool for the situation
* Improve documentation of yield guidelines
Co-authored-by: Mehmet Ozan Kabak <[email protected]>
* Fix newlines and whitespace in comment
* Add a link to the cancellation benchmark documented in the README
* Fix newlines in benchmarks README
---------
Co-authored-by: Mehmet Ozan Kabak <[email protected]>
---
benchmarks/README.md | 3 ++-
datafusion/physical-plan/src/execution_plan.rs | 19 +++++++++++++++++++
2 files changed, 21 insertions(+), 1 deletion(-)
diff --git a/benchmarks/README.md b/benchmarks/README.md
index f17d6b5a07..39b4584bd2 100644
--- a/benchmarks/README.md
+++ b/benchmarks/README.md
@@ -333,7 +333,8 @@ The output of `dfbench` help includes a description of each
benchmark, which is
## Cancellation
-Test performance of cancelling queries
+Test performance of cancelling queries.
+
Queries in DataFusion should stop executing "quickly" after they are
cancelled (the output stream is dropped).
diff --git a/datafusion/physical-plan/src/execution_plan.rs
b/datafusion/physical-plan/src/execution_plan.rs
index d7556bc07c..40bff635bc 100644
--- a/datafusion/physical-plan/src/execution_plan.rs
+++ b/datafusion/physical-plan/src/execution_plan.rs
@@ -260,13 +260,32 @@ pub trait ExecutionPlan: Debug + DisplayAs + Send + Sync {
/// used.
/// Thus, [`spawn`] is disallowed, and instead use [`SpawnedTask`].
///
+ /// To enable timely cancellation, the [`Stream`] that is returned must not
+ /// block the CPU indefinitely and must yield back to the tokio runtime
regularly.
+ /// In a typical [`ExecutionPlan`], this automatically happens unless
there are
+ /// special circumstances; e.g. when the computational complexity of
processing a
+ /// batch is superlinear. See this [general guideline][async-guideline]
for more context
+ /// on this point, which explains why one should avoid spending a long
time without
+ /// reaching an `await`/yield point in asynchronous runtimes.
+ /// This can be achieved by manually returning [`Poll::Pending`] and
setting up wakers
+ /// appropriately, or the use of [`tokio::task::yield_now()`] when
appropriate.
+ /// In special cases that warrant manual yielding, determination for
"regularly" may be
+ /// made using a timer (being careful with the overhead-heavy system call
needed to
+ /// take the time), or by counting rows or batches.
+ ///
+ /// The [cancellation benchmark] tracks some cases of how quickly queries
can
+ /// be cancelled.
+ ///
/// For more details see [`SpawnedTask`], [`JoinSet`] and
[`RecordBatchReceiverStreamBuilder`]
/// for structures to help ensure all background tasks are cancelled.
///
/// [`spawn`]: tokio::task::spawn
+ /// [cancellation benchmark]:
https://github.com/apache/datafusion/blob/main/benchmarks/README.md#cancellation
/// [`JoinSet`]: tokio::task::JoinSet
/// [`SpawnedTask`]: datafusion_common_runtime::SpawnedTask
/// [`RecordBatchReceiverStreamBuilder`]:
crate::stream::RecordBatchReceiverStreamBuilder
+ /// [`Poll::Pending`]: std::task::Poll::Pending
+ /// [async-guideline]: https://ryhl.io/blog/async-what-is-blocking/
///
/// # Implementation Examples
///
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]