Hi Pjotr, > I was just stating that the default openblas package does not perform > well (it is single threaded, for one).
Is it really single-threaded? I remember having a couple of problems with OpenBLAS on our cluster when it is used with Numpy as both would spawn lots of threads. The solution was to limit OpenBLAS to at most two threads. > If I compile for a target it > makes a large difference. The FAQ document[1] says this: The environment variable which control the kernel selection is OPENBLAS_CORETYPE (see driver/others/dynamic.c) e.g. export OPENBLAS_CORETYPE=Haswell. And the function char* openblas_get_corename() returns the used target. [1]: https://github.com/xianyi/OpenBLAS/wiki/Faq Have you tried this and compared the performance? -- Ricardo GPG: BCA6 89B6 3655 3801 C3C6 2150 197A 5888 235F ACAC https://elephly.net
