Dear all,
Thank you for all of your replies and suggestions! I have written my own matrix
multiplication script in order to test NumPy’s performance. Please find it
attached. I’m using the MKL variant of NumPy. Strangely enough the `port
variants py39-numpy` still returns:
port variants py39-numpy
py39-numpy has the variants:
atlas: Use MacPorts ATLAS Libraries
* conflicts with mkl openblas
gcc10: Build using the MacPorts gcc 10 compiler
* conflicts with gcc11 gcc8 gcc9 gccdevel gfortran gfortran
gcc11: Build using the MacPorts gcc 11 compiler
* conflicts with gcc10 gcc8 gcc9 gccdevel gfortran gfortran
gcc8: Build using the MacPorts gcc 8 compiler
* conflicts with gcc10 gcc11 gcc9 gccdevel gfortran gfortran
gcc9: Build using the MacPorts gcc 9 compiler
* conflicts with gcc10 gcc11 gcc8 gccdevel gfortran gfortran
gccdevel: Build using the MacPorts gcc devel compiler
* conflicts with gcc10 gcc11 gcc8 gcc9 gfortran gfortran
[+]gfortran: Build using the MacPorts gcc 11 Fortran compiler
* conflicts with gcc10 gcc11 gcc8 gcc9 gccdevel
mkl: Use MacPorts MKL Libraries
* conflicts with atlas openblas
[+]openblas: Use MacPorts OpenBLAS Libraries
* conflicts with atlas mkl
universal: Build for multiple architectures
Either I don’t understand the expected behaviour or my `port variants` command
returns something else. I would expect it to show [+]gfortran and [+]mkl, not
the [+]openblas. On the other hand, command `port installed py39-numpy` shows:
port installed py39-numpy
The following ports are currently installed:
py39-numpy @1.21.5_1+gfortran+mkl
py39-numpy @1.22.0_0+gfortran+mkl (active)
Finally, I wasn’t able to specify 8 execution threads with `export
MKL_NUM_THREADS=8`. NumPy was still using 4, but the `htop` reported 350–380%
CPU load for the `/usr/bin/env python3 ./dgemm_numpy.py` process. I think this
is good news!
The `otool` command executed under
`/opt/local/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/numpy/core`
shows that MKL backend is being used.
otool -L _multiarray_umath.cpy
_multiarray_umath.cpython-39-darwin.so:
/opt/local/Library/Frameworks/Python.framework/Versions/3.9/lib/libmkl_rt.2.dylib
(compatibility version 0.0.0, current version 0.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current
version 1311.0.0)
I think I still need to experiment with OpenBLAS and compare the performance
numbers. Thank you for your help!
—
Best wishes,
Maxim
#!/usr/bin/env python3
import numpy as np
import time
print(np.__version__)
print(np.show_config())
m = 20000
k = 20000
n = 20000
t0 = time.time()
alpha = np.random.rand()
beta = np.random.rand()
A = np.random.rand(m, k)
B = np.random.rand(k, n)
C = np.random.rand(m, n)
t1 = time.time()
t = t1-t0
print('Generation time: {0:f}'.format(t))
print(' alpha: {0:f}, beta: {1:f}'.format(alpha, beta))
t0 = time.time()
C = alpha*np.matmul(A, B) + beta*C
t1 = time.time()
t = t1-t0
print('Multiplication time: {0:f}'.format(t))
## @eof dgemm_numpy.py
> On 29 Dec 2021, at 13:33, Joshua Root <[email protected]> wrote:
>
> Maxim Abalenkov wrote:
>
>
>> Dear all,
>>
>> I’m looking for guidance please. I would like to make sure, that I use all
>> eight of my CPU cores, when I run Python’s 3.9.9 NumPy on my macOS BigSur
>> 12.1. When I run my NumPy code, I see in ‘htop’, that only one ‘python’
>> process is running and the core utilisation is 20–25%. I remember in the
>> past, stock MacPorts NumPy installation would use Apple’s Accelerate library
>> including the multithreaded BLAS and LAPACK (
>> https://developer.apple.com/documentation/accelerate
>> ). As I understand this is no longer the case.
>>
>> I run Python code using a virtual environment located under
>>
>> /opt/venv/zipfstime/lib/python3.9/site-packages/numpy/core
>>
>> When I change there and issue
>>
>> otool -L _multiarray_umath.cpython-39-darwin.so
>>
>> _multiarray_umath.cpython-39-darwin.so:
>> @loader_path/../.dylibs/libopenblas.0.dylib (compatibility version
>> 0.0.0, current version 0.0.0)
>> /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current
>> version 1281.100.1)
>>
>> In other words, NumPy relies on openBLAS. Command `port variants openblas`
>> returns
>>
>> OpenBLAS has the variants:
>> g95: Build using the g95 Fortran compiler
>> * conflicts with gcc10 gcc11 gcc8 gcc9 gccdevel
>> gcc10: Build using the MacPorts gcc 10 compiler
>> * conflicts with g95 g95 gcc11 gcc8 gcc9 gccdevel
>> [+]gcc11: Build using the MacPorts gcc 11 compiler
>> * conflicts with g95 g95 gcc10 gcc8 gcc9 gccdevel
>> gcc8: Build using the MacPorts gcc 8 compiler
>> * conflicts with g95 g95 gcc10 gcc11 gcc9 gccdevel
>> gcc9: Build using the MacPorts gcc 9 compiler
>> * conflicts with g95 g95 gcc10 gcc11 gcc8 gccdevel
>> gccdevel: Build using the MacPorts gcc devel compiler
>> * conflicts with g95 g95 gcc10 gcc11 gcc8 gcc9
>> [+]lapack: Add Lapack/CLapack support to the library
>> native: Force compilation on machine to get fully optimized library
>> universal: Build for multiple architectures
>>
>> I tried installing the “native” variant of OpenBLAS port with `sudo port
>> install openblas +native` and setting the environment variable
>> `OMP_NUM_THREADS=8`, but I didn’t see any improvement when running my Python
>> code. I would welcome your help and guidance on this subject.
>>
> I'm using py39-numpy with default variants:
>
> % port installed py39-numpy openblas
> The following ports are currently installed:
> OpenBLAS @0.3.19_0+gcc11+lapack (active)
> py39-numpy @1.21.5_1+gfortran+openblas (active)
>
> I see Python using around 600% CPU on my 6-core machine when running this
> basic benchmark
> script:<https://gist.github.com/markus-beuckelmann/8bc25531b11158431a5b09a45abd6276>
>
> If you try that and see how many cores it uses, that will at least tell you
> if there is something different about your code. If it doesn't use all the
> cores for you, there are some other environment variables that OpenBLAS looks
> at that you could check:
> <https://github.com/xianyi/OpenBLAS#setting-the-number-of-threads-using-environment-variables>
>
> - Josh
>