In light of ARROW-9924, I wanted to rekindle the discussion about our
approach to multithreading (especially the _programming model_) in
C++. We had some discussions about this about 6 months ago and there
were more discussions as I recall in summer 2019.

Realistically, we are going to be consistently dealing with
independent concurrent in-process workloads that each respectively can
go faster by multithreading. These could be things like:

* Reading file formats (CSV, Parquet, etc.) that benefit from
multithreaded parsing/decoding
* Reading one or more files in parallel using the Datasets API
* Executing any number of multithreaded analytical workloads

One obvious issue with our thread scheduling is the FIFO nature of the
global thread pool. If a new independent multithreaded workload shows
up, it has to wait for other workloads to complete before the new work
will be scheduled. Think about a Flight server serving queries to
users -- is it fair for one query to "hog" the thread pool and force
other requests to wait until they can get access to some CPU
resources? You could imagine a workload that spawns 10 minutes worth
of CPU work, where a new workload has to wait for all of that work to
complete before having any tasks scheduled for execution.

The approach that's been taken in the Datasets API to avoid problems
with nested parallelism (file-specific operations spawning multiple
tasks onto the global thread pool) is simply to disable multithreading
at the level of a single file. This is clearly suboptimal.

We have additional problems in that some file-loading related tasks do
a mixture of CPU work and IO work, and once a thread has been
dispatched to execute one of these tasks, when IO takes place, a CPU
core may sit underutilized while the IO is waiting.

There's more aspects we can discuss, but in general I think we need to
come up with a programming model for building our C++ system
components with the following requirements:

* Deadlocks not possible by design
* Any component can safely use "nested parallelism" without the
programmer having to worry about deadlocks or one task "hogging" the
thread pool. So in other words, if there's only a single
multithreading-capable workload running, we "let it rip"
* Resources can be reasonably fairly allocated amongst concurrent
workloads (think: independent requests coming in through Flight, or
scan tasks on different Parquet files in the Datasets API). Limit
scenarios where a new workload is blocked altogether on the completion
of other workloads
* A well-defined programming pattern for tasks that do a mixture of
CPU work and IO work that allows CPU cores to be used when a task is
waiting on IO

We can't be the only project that has these problems, so I'm
interested to see what solutions have been successfully employed by
others. For example, it strikes me as similar to concurrency issues
inside an analytic database. How are they preventing concurrent
workload starvation problems or handling CPU/IO task scheduling to
avoid CPU underutilization?

Choices of which threading libraries we might use to implement a
viable solution (e.g. TBB) seem secondary to the programming model
that we use to implement our components.

Thanks,
Wes

Reply via email to