https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119

--- Comment #18 from Joost VandeVondele <Joost.VandeVondele at mat dot ethz.ch> 
---
(In reply to Jerry DeLisle from comment #17)
> I have done some experimenting.  Since gcc supports OMP and I think to some
> extent ACC why not come up with a MATMUL that exploits these if present?  On
> the darwin platform discussed in comment #12, the performance is excellent. 
> Does darwin implementation provided exploit OpenCL?  What is it using?  Why
> not enable that on other platforms if present.
> 
> I am going to explore OpenCL and clBLAS to see if I can get it to work.  If
> I am successful, I would like to hide it behind MATMUL if possible.  Any
> other opinions?

yes, this is tricky. In a multithreaded code executing matmul, what is the
strategy (nested parallelism, serial, ...) ? We usually link in a serial blas
because threading in the library is usually not good for performance of the
code overall, i.e. nested parallelism tends to perform badly. Also, how many
threads would you use by default (depending on matrix size, machine load) ?
Users on an N core machine might run N jobs in parallel, and not expect those
to start several threads each. 

Maybe, this could be part of the auto-parallelize (or similar) option that gcc
has ?

Reply via email to