[
https://issues.apache.org/jira/browse/ARROW-16498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated ARROW-16498:
-----------------------------------
Labels: pull-request-available (was: )
> [C++] Fix potential deadlock in arrow::compute::TaskScheduler
> -------------------------------------------------------------
>
> Key: ARROW-16498
> URL: https://issues.apache.org/jira/browse/ARROW-16498
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++
> Reporter: Weston Pace
> Priority: Major
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> An extremely simplified version of the task scheduler's ScheduleMore method
> it looks something like:
> {noformat}
> void ScheduleMore(int num_to_schedule) {
> tasks_that_need_running_.fetch_add(num_to_schedule);
> if (!weak_lock.lock()) {
> // If someone else is scheduling then return early
> return;
> }
> auto tasks = PickTasks();
> weak_lock.unlock();
> }
> {noformat}
> It is possible for one thread to have the lock, and find 0 tasks. But then,
> before it gives up the lock, another thread adds tasks and fails to acquire
> the lock. Neither thread will schedule anything even though there are tasks
> to run. This can lead to deadlock.
> The proposed PR changes the logic to (still extremely simplified):
> {noformat}
> void ScheduleMore(int num_to_schedule) {
> tasks_that_need_running_.fetch_add(num_to_schedule);
> tasks_added_recently.store(true);
> if (!weak_lock.lock()) {
> // If someone else is scheduling then return early
> return;
> }
> auto tasks = PickTasks();
> if (tasks_added_recently.compare_exchange_strong(true, false)) {
> if (tasks.empty()) {
> ScheduleMore();
> }
> }
> weak_lock.unlock();
> }
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.7#820007)