romainfrancois commented on pull request #9615: URL: https://github.com/apache/arrow/pull/9615#issuecomment-828472890
Marking this as ready to review. I've changed the approach this week so that it does not need to resort to locking. This introduces the `RTasks` class that factors out handling of tasks that can be run in parallel and tasks that cannot (because they might touch the R central resource, e.g. protect an R object ...). It has `void Append(bool parallel, Task&& task)` to add a task. Based on `parallel` the task is either added to the parallel task group, and potentially started immediately, or delayed to run until all the tasks have been added. Then it has `Finish()` which 1) runs the tasks that have been delayed, and then waits for the parallel tasks to finish. With this, the `RConverter` class gained `virtual void DelayedExtend(SEXP values, int64_t size, RTasks& tasks)`. The idea is that an implementation might first do some setup work that has to happen on the main thread because it uses central R resources, but then the bulk of the work is either run in parallel if possible or delayed. The `RStructConverter` implementation is a good example that has to do some work upfront but then can still benefit from parallel ingestion of its columns. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
