On Wed, Dec 20, 2017 at 07:15:16PM +0100, Ricardo Wurmus wrote: > Is it really single-threaded? I remember having a couple of problems > with OpenBLAS on our cluster when it is used with Numpy as both would > spawn lots of threads. The solution was to limit OpenBLAS to at most > two threads.
Looks like 1 on my system. > > If I compile for a target it > > makes a large difference. > > The FAQ document[1] says this: > > The environment variable which control the kernel selection is > OPENBLAS_CORETYPE (see driver/others/dynamic.c) e.g. export > OPENBLAS_CORETYPE=Haswell. And the function char* > openblas_get_corename() returns the used target. > > [1]: https://github.com/xianyi/OpenBLAS/wiki/Faq > > Have you tried this and compared the performance? About 10x difference on 24+ cores for matrix multiplication (my version vs what comes with Guix). I do think we need to default to a conservative openblas for general use. Question is how we make it fly on dedicated hardware. package python-numpy:openblas-haswellp for the parallel version? also for R and others. Problem is that we blow up the types of packages. Pj.
