RE: [DISCUSS][C++][Proposal] Threading engine for Arrow

Malakhov, Anton Fri, 03 May 2019 09:30:51 -0700

Thanks for your answers,

> -----Original Message-----
> From: Antoine Pitrou [mailto:anto...@python.org]
> Sent: Friday, May 3, 2019 03:54


> Le 03/05/2019 à 05:47, Jed Brown a écrit :
> > I would caution to please not commit to the MKL/BLAS model in which
I'm actually talking about threading layers model where MKL supports several 
OpenMP runtimes (Intel, GNU, PGI) and TBB, as well as non-threaded version. It 
even supports dynamic selection, please refer to: 
https://software.intel.com/en-us/mkl-macos-developer-guide-dynamically-selecting-the-interface-and-threading-layer
The same approach we implemented in Numba (#2245):  
https://numba.pydata.org/numba-doc/dev/user/threading-layer.html

> > the library creates threads internally.  It's a disaster for managing
> > oversubscription and affinity issues among groups of threads and/or
> > multiple processes (e.g., MPI).
This is exactly what I'm talking about referring as issues with threading 
composability! OpenMP is not easy to have inside a library. I described it in 
this document: 
https://cwiki.apache.org/confluence/display/ARROW/Parallel+Execution+Engine

> Implicit multi-threading is important for user-friendliness reasons 
> (especially in
> higher-level bindings such as the Python-bindings).
Cannot agree more! There might be not enough parallelism on the application 
level, adding parallelism from DSLs is important for better CPU utilization but 
it is also tricky because of these incompatibility issues.

> > The library is then free to use constructs like omp taskgroup/taskloop
> > as granularity warrants; it will never utilize threads that the
> > application didn't explicitly give it.
> 
> I don't think we're planning to use OpenMP in Arrow, though Wes probably has a
> better answer.        
I'd not exclude OpenMP from the consideration completely. I want to start with 
TBB but nothing composes better with OpenMP as OpenMP itself. The same MKL 
(i.e. Numpy) defaults to OpenMP threading. BTW, there is no more compatibility 
layer between TBB and OpenMP, it was removed from the latter.


> -----Original Message-----
> From: Antoine Pitrou [mailto:anto...@python.org]
> Sent: Friday, May 3, 2019 03:49
> 
> Another possibility is to look at our C++ CSV reader and parser (in
> src/arrow/csv).  It's the only piece of Arrow that uses non-trivial 
> multi-threading
> right now (with tasks spawning new tasks dynamically, see
> InferringColumnBuilder).  It's based on the ThreadPool and TaskGroup APIs (in
> src/arrow/util/).  These APIs are not set in stone, so you're free to propose
> changes to make them fit better with a TBB-based implementation.
Great! This is what I was looking for!


// Anton

RE: [DISCUSS][C++][Proposal] Threading engine for Arrow

Reply via email to