[jira] [Created] (ARROW-17350) [C++] Create a scheduler for asynchronous work

Weston Pace (Jira) Mon, 08 Aug 2022 17:39:30 -0700

Weston Pace created ARROW-17350:
-----------------------------------

             Summary: [C++] Create a scheduler for asynchronous work
                 Key: ARROW-17350
                 URL: https://issues.apache.org/jira/browse/ARROW-17350
             Project: Apache Arrow
          Issue Type: Bug
          Components: C++
            Reporter: Weston Pace



Note, in the interest of keeping things simple, this ideally replaces the 
AsyncTaskGroup.  This is needed to simplify the logic in ARROW-17287.

The format and implementation will likely be inspired by the synchronous 
schedulers, TaskScheduler and TaskGroup but it will remain a separate 
implementation.  In the future, when we dedicate time to improving our 
synchronous scheduler, we can decide if it makes sense to merge these two types.

{noformat}
/// A utility which keeps tracks of, and schedules, asynchronous tasks
///
/// An asynchronous task has a synchronous component and an asynchronous 
component.
/// The synchronous component typically schedules some kind of work on an 
external
/// resource (e.g. the I/O thread pool or some kind of kernel-based asynchronous
/// resource like io_uring).  The asynchronous part represents the work
/// done on that external resource.  Executing the synchronous part will be 
referred
/// to as "submitting the task" since this usually includes submitting the 
asynchronous
/// portion to the external thread pool.
///
/// By default the scheduler will submit the task (execute the synchronous 
part) as
/// soon as it is added, assuming the underlying thread pool hasn't terminated 
or the
/// scheduler hasn't aborted.  In this mode the scheduler is simply acting as
/// a task group, keeping track of the ongoing work.
///
/// This can be used to provide structured concurrency for asynchronous 
development.
/// A task group created at a high level can be distributed amongst low level 
components
/// which register work to be completed.  The high level job can then wait for 
all work
/// to be completed before cleaning up.
///
/// A task scheduler must eventually be ended when all tasks have been added.  
Once the
/// scheduler has been ended it is an error to add further tasks.  Note, it is 
not an
/// error to add additional tasks after a scheduler has aborted (though these 
tasks
/// will be ignored and never submitted).  The scheduler has a futuer which 
will complete
/// once the scheduler has been ended AND all remaining tasks have finished 
executing.
/// Ending a scheduler will NOT cause the scheduler to flush existing tasks.
///
/// Task failure (either the synchronous portion or the asynchronous portion) 
will cause
/// the scheduler to enter an aborted state.  The first such failure will be 
reported in
/// the final task future.
///
/// The scheduler can also be manually aborted.  A cancellation status will be 
reported as
/// the final task future.
///
/// It is also possible to limit the number of concurrent tasks the scheduler 
will
/// execute. This is done by setting a task limit.  The task limit initially 
assumes all
/// tasks are equal but a custom cost can be supplied when scheduling a task 
(e.g. based
/// on the total I/O cost of the task, or the expected RAM utilization of the 
task)
///
/// When the total number of running tasks is limited then scheduler priority 
may also
/// become a consideration.  By default the scheduler runs with a FIFO queue 
but a custom
/// task queue can be provided.  One could, for example, use a priority queue 
to control
/// the order in which tasks are executed.
///
/// It is common to have multiple stages of execution.  For example, when 
scanning, we
/// first inspect each fragment (the inspect stage) to figure out the row 
groups and then
/// we scan row groups (the scan stage) to read in the data.  This sort of 
multi-stage
/// execution should be represented as two seperate task groups.  The first 
task group can
/// then have a custom finish callback which ends the second task group.
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (ARROW-17350) [C++] Create a scheduler for asynchronous work

Reply via email to