On 29 October 2015 at 20:25, Julian Taylor <jtaylor.deb...@googlemail.com>
wrote:

> should be possible by putting this into: ~/.numpy-site.cfg
>
> [openblas]
> libraries = openblasp
>
> LD_PRELOAD the file should also work.
>
>
Thank!

I did some timings on a dot product of a square matrix of size 10000 with
LD_PRELOADing  the different versions. I checked that all the cores were
crunching when an other than plain libopenblas/64 was selected. Here are
the timings in seconds:


Intel i5-3317U:
/usr/lib64/libopenblaso.so
86.3651878834
/usr/lib64/libopenblasp64.so
96.8817200661
/usr/lib64/libopenblas.so
114.60265708
/usr/lib64/libopenblasp.so
107.927740097
/usr/lib64/libopenblaso64.so
97.5418870449
/usr/lib64/libopenblas64.so
109.000799179

Intel  i7-4770:
/usr/lib64/libopenblas.so
37.9794859886
/usr/lib64/libopenblasp.so
12.3455951214
/usr/lib64/libopenblas64.so
38.0571939945
/usr/lib64/libopenblasp64.so
12.5558650494
/usr/lib64/libopenblaso64.so
12.4118559361
/usr/lib64/libopenblaso.so
13.4787950516

Both computers have the same software and OS. So, it seems that openblas
doesn't get a significant advantage from going parallel in the older i5;
the i7 using all its cores (4 + 4 hyperthread) gains a 3x speed up, and
there is no big different between OpenMP and pthreads.

I am particullary puzzled by the i5 results, shouldn't threads get a
noticeable speedup?


/David.
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to