hi all,

We've had some discussions in the past about our approach to nested
parallelism (for example, reading multiple Parquet or CSV files or
compressed Arrow IPC files in parallel, each of which can benefit from
internal parallelism for faster parsing / decoding performance). Since
then, there has been a lot of work on asynchronous IO and futures in
the C++ project, and Parquet and CSV have refactored to use composable
futures and an Executor interface has been introduced.

Relatedly, I saw that Weston had worked on a work-stealing thread pool
implementation [1]. I've read discussions in other projects on the
nuances around multicore work scheduling, for example in Rust's Tokio
asynchronous runtime [2].

It seems to me that adopting an executor-queue-per-logical-CPU-core
and a work-stealing scheduler is a reasonable path forward to solve
the nested parallelism problem. There are some nuances about the
execution-ordering of tasks within the context of a single CPU core
(i.e. child/dependent tasks should be given priority / not be
preempted by sibling tasks of the parent task).

I'm curious what people think about a high level plan going forward
and what are the corresponding follow up projects to make sure that
each place in the codebase where we provide for spawning child tasks
(e.g. when reading many Parquet or CSV files in parallel) will
"compose" correctly to yield desirable task-execution-ordering and
balanced execution across CPU cores.

Thanks
Wes

[1]: https://github.com/apache/arrow/pull/10420
[2]: http://tokio.rs/blog/2019-10-scheduler

Reply via email to