https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119

--- Comment #20 from Joost VandeVondele <Joost.VandeVondele at mat dot ethz.ch> 
---
(In reply to Jerry DeLisle from comment #19)
> If I can get something working I am thinking something like
> -fexternal-blas-n, if -n not given then default to current libblas
> behaviour. This way users have some control. With GPUs, it is not unusual to
> have hundreds of cores.  We can also, at run time, see if the opencl is
> already initialized which may mean used elsewhere so don't mess with it.

Hidden behind a -fexternal-blas-n switch might be an option. Including GPUs
seems even a tad more tricky. We have a paper on GPU (small) matrix
multiplication, http://dbcsr.cp2k.org/_media/gpu_book_chapter_submitted.pdf .
BTW, another interesting project is the libxsmm library more aimed at small
(<128) matrices see : https://github.com/hfp/libxsmm . Not sure if this info is
useful in this context, but it might provide inspiration.

Reply via email to