Weston Pace created ARROW-17350:
-----------------------------------
Summary: [C++] Create a scheduler for asynchronous work
Key: ARROW-17350
URL: https://issues.apache.org/jira/browse/ARROW-17350
Project: Apache Arrow
Issue Type: Bug
Components: C++
Reporter: Weston Pace
Note, in the interest of keeping things simple, this ideally replaces the
AsyncTaskGroup. This is needed to simplify the logic in ARROW-17287.
The format and implementation will likely be inspired by the synchronous
schedulers, TaskScheduler and TaskGroup but it will remain a separate
implementation. In the future, when we dedicate time to improving our
synchronous scheduler, we can decide if it makes sense to merge these two types.
{noformat}
/// A utility which keeps tracks of, and schedules, asynchronous tasks
///
/// An asynchronous task has a synchronous component and an asynchronous
component.
/// The synchronous component typically schedules some kind of work on an
external
/// resource (e.g. the I/O thread pool or some kind of kernel-based asynchronous
/// resource like io_uring). The asynchronous part represents the work
/// done on that external resource. Executing the synchronous part will be
referred
/// to as "submitting the task" since this usually includes submitting the
asynchronous
/// portion to the external thread pool.
///
/// By default the scheduler will submit the task (execute the synchronous
part) as
/// soon as it is added, assuming the underlying thread pool hasn't terminated
or the
/// scheduler hasn't aborted. In this mode the scheduler is simply acting as
/// a task group, keeping track of the ongoing work.
///
/// This can be used to provide structured concurrency for asynchronous
development.
/// A task group created at a high level can be distributed amongst low level
components
/// which register work to be completed. The high level job can then wait for
all work
/// to be completed before cleaning up.
///
/// A task scheduler must eventually be ended when all tasks have been added.
Once the
/// scheduler has been ended it is an error to add further tasks. Note, it is
not an
/// error to add additional tasks after a scheduler has aborted (though these
tasks
/// will be ignored and never submitted). The scheduler has a futuer which
will complete
/// once the scheduler has been ended AND all remaining tasks have finished
executing.
/// Ending a scheduler will NOT cause the scheduler to flush existing tasks.
///
/// Task failure (either the synchronous portion or the asynchronous portion)
will cause
/// the scheduler to enter an aborted state. The first such failure will be
reported in
/// the final task future.
///
/// The scheduler can also be manually aborted. A cancellation status will be
reported as
/// the final task future.
///
/// It is also possible to limit the number of concurrent tasks the scheduler
will
/// execute. This is done by setting a task limit. The task limit initially
assumes all
/// tasks are equal but a custom cost can be supplied when scheduling a task
(e.g. based
/// on the total I/O cost of the task, or the expected RAM utilization of the
task)
///
/// When the total number of running tasks is limited then scheduler priority
may also
/// become a consideration. By default the scheduler runs with a FIFO queue
but a custom
/// task queue can be provided. One could, for example, use a priority queue
to control
/// the order in which tasks are executed.
///
/// It is common to have multiple stages of execution. For example, when
scanning, we
/// first inspect each fragment (the inspect stage) to figure out the row
groups and then
/// we scan row groups (the scan stage) to read in the data. This sort of
multi-stage
/// execution should be represented as two seperate task groups. The first
task group can
/// then have a custom finish callback which ends the second task group.
{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)