[
https://issues.apache.org/jira/browse/ARROW-17350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated ARROW-17350:
-----------------------------------
Labels: pull-request-available (was: )
> [C++] Create a scheduler for asynchronous work
> ----------------------------------------------
>
> Key: ARROW-17350
> URL: https://issues.apache.org/jira/browse/ARROW-17350
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++
> Reporter: Weston Pace
> Priority: Major
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Note, in the interest of keeping things simple, this ideally replaces the
> AsyncTaskGroup. This is needed to simplify the logic in ARROW-17287.
> The format and implementation will likely be inspired by the synchronous
> schedulers, TaskScheduler and TaskGroup but it will remain a separate
> implementation. In the future, when we dedicate time to improving our
> synchronous scheduler, we can decide if it makes sense to merge these two
> types.
> {noformat}
> /// A utility which keeps tracks of, and schedules, asynchronous tasks
> ///
> /// An asynchronous task has a synchronous component and an asynchronous
> component.
> /// The synchronous component typically schedules some kind of work on an
> external
> /// resource (e.g. the I/O thread pool or some kind of kernel-based
> asynchronous
> /// resource like io_uring). The asynchronous part represents the work
> /// done on that external resource. Executing the synchronous part will be
> referred
> /// to as "submitting the task" since this usually includes submitting the
> asynchronous
> /// portion to the external thread pool.
> ///
> /// By default the scheduler will submit the task (execute the synchronous
> part) as
> /// soon as it is added, assuming the underlying thread pool hasn't
> terminated or the
> /// scheduler hasn't aborted. In this mode the scheduler is simply acting as
> /// a task group, keeping track of the ongoing work.
> ///
> /// This can be used to provide structured concurrency for asynchronous
> development.
> /// A task group created at a high level can be distributed amongst low level
> components
> /// which register work to be completed. The high level job can then wait
> for all work
> /// to be completed before cleaning up.
> ///
> /// A task scheduler must eventually be ended when all tasks have been added.
> Once the
> /// scheduler has been ended it is an error to add further tasks. Note, it
> is not an
> /// error to add additional tasks after a scheduler has aborted (though these
> tasks
> /// will be ignored and never submitted). The scheduler has a futuer which
> will complete
> /// once the scheduler has been ended AND all remaining tasks have finished
> executing.
> /// Ending a scheduler will NOT cause the scheduler to flush existing tasks.
> ///
> /// Task failure (either the synchronous portion or the asynchronous portion)
> will cause
> /// the scheduler to enter an aborted state. The first such failure will be
> reported in
> /// the final task future.
> ///
> /// The scheduler can also be manually aborted. A cancellation status will
> be reported as
> /// the final task future.
> ///
> /// It is also possible to limit the number of concurrent tasks the scheduler
> will
> /// execute. This is done by setting a task limit. The task limit initially
> assumes all
> /// tasks are equal but a custom cost can be supplied when scheduling a task
> (e.g. based
> /// on the total I/O cost of the task, or the expected RAM utilization of the
> task)
> ///
> /// When the total number of running tasks is limited then scheduler priority
> may also
> /// become a consideration. By default the scheduler runs with a FIFO queue
> but a custom
> /// task queue can be provided. One could, for example, use a priority queue
> to control
> /// the order in which tasks are executed.
> ///
> /// It is common to have multiple stages of execution. For example, when
> scanning, we
> /// first inspect each fragment (the inspect stage) to figure out the row
> groups and then
> /// we scan row groups (the scan stage) to read in the data. This sort of
> multi-stage
> /// execution should be represented as two seperate task groups. The first
> task group can
> /// then have a custom finish callback which ends the second task group.
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)