[ 
https://issues.apache.org/jira/browse/ARROW-17350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-17350:
-----------------------------------
    Labels: pull-request-available  (was: )

> [C++] Create a scheduler for asynchronous work
> ----------------------------------------------
>
>                 Key: ARROW-17350
>                 URL: https://issues.apache.org/jira/browse/ARROW-17350
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>            Reporter: Weston Pace
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Note, in the interest of keeping things simple, this ideally replaces the 
> AsyncTaskGroup.  This is needed to simplify the logic in ARROW-17287.
> The format and implementation will likely be inspired by the synchronous 
> schedulers, TaskScheduler and TaskGroup but it will remain a separate 
> implementation.  In the future, when we dedicate time to improving our 
> synchronous scheduler, we can decide if it makes sense to merge these two 
> types.
> {noformat}
> /// A utility which keeps tracks of, and schedules, asynchronous tasks
> ///
> /// An asynchronous task has a synchronous component and an asynchronous 
> component.
> /// The synchronous component typically schedules some kind of work on an 
> external
> /// resource (e.g. the I/O thread pool or some kind of kernel-based 
> asynchronous
> /// resource like io_uring).  The asynchronous part represents the work
> /// done on that external resource.  Executing the synchronous part will be 
> referred
> /// to as "submitting the task" since this usually includes submitting the 
> asynchronous
> /// portion to the external thread pool.
> ///
> /// By default the scheduler will submit the task (execute the synchronous 
> part) as
> /// soon as it is added, assuming the underlying thread pool hasn't 
> terminated or the
> /// scheduler hasn't aborted.  In this mode the scheduler is simply acting as
> /// a task group, keeping track of the ongoing work.
> ///
> /// This can be used to provide structured concurrency for asynchronous 
> development.
> /// A task group created at a high level can be distributed amongst low level 
> components
> /// which register work to be completed.  The high level job can then wait 
> for all work
> /// to be completed before cleaning up.
> ///
> /// A task scheduler must eventually be ended when all tasks have been added. 
>  Once the
> /// scheduler has been ended it is an error to add further tasks.  Note, it 
> is not an
> /// error to add additional tasks after a scheduler has aborted (though these 
> tasks
> /// will be ignored and never submitted).  The scheduler has a futuer which 
> will complete
> /// once the scheduler has been ended AND all remaining tasks have finished 
> executing.
> /// Ending a scheduler will NOT cause the scheduler to flush existing tasks.
> ///
> /// Task failure (either the synchronous portion or the asynchronous portion) 
> will cause
> /// the scheduler to enter an aborted state.  The first such failure will be 
> reported in
> /// the final task future.
> ///
> /// The scheduler can also be manually aborted.  A cancellation status will 
> be reported as
> /// the final task future.
> ///
> /// It is also possible to limit the number of concurrent tasks the scheduler 
> will
> /// execute. This is done by setting a task limit.  The task limit initially 
> assumes all
> /// tasks are equal but a custom cost can be supplied when scheduling a task 
> (e.g. based
> /// on the total I/O cost of the task, or the expected RAM utilization of the 
> task)
> ///
> /// When the total number of running tasks is limited then scheduler priority 
> may also
> /// become a consideration.  By default the scheduler runs with a FIFO queue 
> but a custom
> /// task queue can be provided.  One could, for example, use a priority queue 
> to control
> /// the order in which tasks are executed.
> ///
> /// It is common to have multiple stages of execution.  For example, when 
> scanning, we
> /// first inspect each fragment (the inspect stage) to figure out the row 
> groups and then
> /// we scan row groups (the scan stage) to read in the data.  This sort of 
> multi-stage
> /// execution should be represented as two seperate task groups.  The first 
> task group can
> /// then have a custom finish callback which ends the second task group.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to