This is an automated email from the ASF dual-hosted git repository. alamb pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/datafusion.git
The following commit(s) were added to refs/heads/main by this push: new c37b851332 Minor: Add more links to cooperative / scheduling docs (#16484) c37b851332 is described below commit c37b851332b5abe71d4c6019d5114678b8964be4 Author: Andrew Lamb <and...@nerdnetworks.org> AuthorDate: Mon Jun 23 16:25:37 2025 -0400 Minor: Add more links to cooperative / scheduling docs (#16484) --- datafusion/core/src/lib.rs | 13 ++++++++++++- datafusion/physical-plan/src/coop.rs | 2 +- 2 files changed, 13 insertions(+), 2 deletions(-) diff --git a/datafusion/core/src/lib.rs b/datafusion/core/src/lib.rs index 3e3d80caaa..7a4a1201d6 100644 --- a/datafusion/core/src/lib.rs +++ b/datafusion/core/src/lib.rs @@ -498,10 +498,21 @@ //! While preparing for execution, DataFusion tries to create this many distinct //! `async` [`Stream`]s for each [`ExecutionPlan`]. //! The [`Stream`]s for certain [`ExecutionPlan`]s, such as [`RepartitionExec`] -//! and [`CoalescePartitionsExec`], spawn [Tokio] [`task`]s, that are run by +//! and [`CoalescePartitionsExec`], spawn [Tokio] [`task`]s, that run on //! threads managed by the [`Runtime`]. //! Many DataFusion [`Stream`]s perform CPU intensive processing. //! +//! ### Cooperative Scheduling +//! +//! DataFusion uses cooperative scheduling, which means that each [`Stream`] +//! is responsible for yielding control back to the [`Runtime`] after +//! some amount of work is done. Please see the [`coop`] module documentation +//! for more details. +//! +//! [`coop`]: datafusion_physical_plan::coop +//! +//! ### Network I/O and CPU intensive tasks +//! //! Using `async` for CPU intensive tasks makes it easy for [`TableProvider`]s //! to perform network I/O using standard Rust `async` during execution. //! However, this design also makes it very easy to mix CPU intensive and latency diff --git a/datafusion/physical-plan/src/coop.rs b/datafusion/physical-plan/src/coop.rs index d55c7b8c97..be0afa07ea 100644 --- a/datafusion/physical-plan/src/coop.rs +++ b/datafusion/physical-plan/src/coop.rs @@ -19,7 +19,7 @@ //! //! # Cooperative scheduling //! -//! A single call to `poll_next` on a top-level `Stream` may potentially perform a lot of work +//! A single call to `poll_next` on a top-level [`Stream`] may potentially perform a lot of work //! before it returns a `Poll::Pending`. Think for instance of calculating an aggregation over a //! large dataset. //! If a `Stream` runs for a long period of time without yielding back to the Tokio executor, --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org For additional commands, e-mail: commits-h...@datafusion.apache.org