hi all, We've had some discussions in the past about our approach to nested parallelism (for example, reading multiple Parquet or CSV files or compressed Arrow IPC files in parallel, each of which can benefit from internal parallelism for faster parsing / decoding performance). Since then, there has been a lot of work on asynchronous IO and futures in the C++ project, and Parquet and CSV have refactored to use composable futures and an Executor interface has been introduced.
Relatedly, I saw that Weston had worked on a work-stealing thread pool implementation [1]. I've read discussions in other projects on the nuances around multicore work scheduling, for example in Rust's Tokio asynchronous runtime [2]. It seems to me that adopting an executor-queue-per-logical-CPU-core and a work-stealing scheduler is a reasonable path forward to solve the nested parallelism problem. There are some nuances about the execution-ordering of tasks within the context of a single CPU core (i.e. child/dependent tasks should be given priority / not be preempted by sibling tasks of the parent task). I'm curious what people think about a high level plan going forward and what are the corresponding follow up projects to make sure that each place in the codebase where we provide for spawning child tasks (e.g. when reading many Parquet or CSV files in parallel) will "compose" correctly to yield desirable task-execution-ordering and balanced execution across CPU cores. Thanks Wes [1]: https://github.com/apache/arrow/pull/10420 [2]: http://tokio.rs/blog/2019-10-scheduler