felipecrv commented on issue #40301:
URL: https://github.com/apache/arrow/issues/40301#issuecomment-2043635169
Sharing some progress here in the open:
> the elegant solution is work-stealing. Now you know why work-stealing was
invented.
@anjakefala will test a version of this that uses a `ParallelFor` [1] that
doesn't create one `Future` per task/column — the function keeps going by
popping the next task (column index) from a shared `std::atomic<int>` called
`next_task`. This atomic integer plays the same role as the multiple task
queues in a generic work-stealing algorithm implementation.
BTW, this is pseudo-code for generic Work Stealing:
```python
function WorkerThread:
while true:
if own_task_queue is not empty:
task = own_task_queue.pop()
execute(task)
else:
// Try to steal work from another thread
victim = randomly_select_thread()
stolen_task = victim.steal_task()
if stolen_task is not None:
execute(stolen_task)
else:
// No work available, terminate
return
function steal_task:
if victim_task_queue is not empty:
return victim_task_queue.pop()
else:
return None
```
In `ParallelForWithBackoff` [1], each worker starts from a task/column they
should complete (consider that the `own_task_queue`) and when done it can keep
going while `next_task.fetch_add(1, std::memory_order_acq_rel)` returns a task
smaller than `num_tasks`. Getting that atomic and incrementing it is how it
"steals" the task from all the other workers. This maximizes throughput and
reduces thread-local allocations because the same thread can very quickly go
through many columns.
[1]
https://github.com/felipecrv/arrow/blob/b54cb408a8c73af0c4fc76cdceadca0c0b240cb0/cpp/src/arrow/util/parallel.h#L104-L229
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]