Hi Pjotr,

> I was just stating that the default openblas package does not perform
> well (it is single threaded, for one).

Is it really single-threaded?  I remember having a couple of problems
with OpenBLAS on our cluster when it is used with Numpy as both would
spawn lots of threads.  The solution was to limit OpenBLAS to at most
two threads.

> If I compile for a target it
> makes a large difference.

The FAQ document[1] says this:

  The environment variable which control the kernel selection is
  OPENBLAS_CORETYPE (see driver/others/dynamic.c) e.g. export
  OPENBLAS_CORETYPE=Haswell. And the function char*
  openblas_get_corename() returns the used target.

[1]: https://github.com/xianyi/OpenBLAS/wiki/Faq

Have you tried this and compared the performance?

--
Ricardo

GPG: BCA6 89B6 3655 3801 C3C6  2150 197A 5888 235F ACAC
https://elephly.net


Reply via email to