> On 3 Jan 2022, at 5:18 pm, Maxim Abalenkov <maxim.abalen...@gmail.com> wrote: > > Dear all, > > Thank you for all of your replies and suggestions! I have written my own > matrix multiplication script in order to test NumPy’s performance. Please > find it attached. I’m using the MKL variant of NumPy. Strangely enough the > `port variants py39-numpy` still returns: > > port variants py39-numpy > py39-numpy has the variants: > atlas: Use MacPorts ATLAS Libraries > * conflicts with mkl openblas > gcc10: Build using the MacPorts gcc 10 compiler > * conflicts with gcc11 gcc8 gcc9 gccdevel gfortran gfortran > gcc11: Build using the MacPorts gcc 11 compiler > * conflicts with gcc10 gcc8 gcc9 gccdevel gfortran gfortran > gcc8: Build using the MacPorts gcc 8 compiler > * conflicts with gcc10 gcc11 gcc9 gccdevel gfortran gfortran > gcc9: Build using the MacPorts gcc 9 compiler > * conflicts with gcc10 gcc11 gcc8 gccdevel gfortran gfortran > gccdevel: Build using the MacPorts gcc devel compiler > * conflicts with gcc10 gcc11 gcc8 gcc9 gfortran gfortran > [+]gfortran: Build using the MacPorts gcc 11 Fortran compiler > * conflicts with gcc10 gcc11 gcc8 gcc9 gccdevel > mkl: Use MacPorts MKL Libraries > * conflicts with atlas openblas > [+]openblas: Use MacPorts OpenBLAS Libraries > * conflicts with atlas mkl > universal: Build for multiple architectures > > Either I don’t understand the expected behaviour or my `port variants` > command returns something else. I would expect it to show [+]gfortran and > [+]mkl, not the [+]openblas.
No. The + sign indicates which variants are enabled by default, not what you happened to be using yourself. For that the command you use below correctly shows this. > On the other hand, command `port installed py39-numpy` shows: > > port installed py39-numpy > The following ports are currently installed: > py39-numpy @1.21.5_1+gfortran+mkl > py39-numpy @1.22.0_0+gfortran+mkl (active) > > Finally, I wasn’t able to specify 8 execution threads with `export > MKL_NUM_THREADS=8`. NumPy was still using 4, but the `htop` reported 350–380% > CPU load for the `/usr/bin/env python3 ./dgemm_numpy.py` process. I think > this is good news! > > The `otool` command executed under > `/opt/local/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/numpy/core` > shows that MKL backend is being used. > > otool -L _multiarray_umath.cpy > _multiarray_umath.cpython-39-darwin.so: > > /opt/local/Library/Frameworks/Python.framework/Versions/3.9/lib/libmkl_rt.2.dylib > (compatibility version 0.0.0, current version 0.0.0) > /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version > 1311.0.0) > > I think I still need to experiment with OpenBLAS and compare the performance > numbers. Thank you for your help! > > — > Best wishes, > Maxim >
#!/usr/bin/env python3 import numpy as np import time print(np.__version__) print(np.show_config()) m = 20000 k = 20000 n = 20000 t0 = time.time() alpha = np.random.rand() beta = np.random.rand() A = np.random.rand(m, k) B = np.random.rand(k, n) C = np.random.rand(m, n) t1 = time.time() t = t1-t0 print('Generation time: {0:f}'.format(t)) print(' alpha: {0:f}, beta: {1:f}'.format(alpha, beta)) t0 = time.time() C = alpha*np.matmul(A, B) + beta*C t1 = time.time() t = t1-t0 print('Multiplication time: {0:f}'.format(t)) ## @eof dgemm_numpy.py
> > >>> On 29 Dec 2021, at 13:33, Joshua Root <j...@macports.org> wrote: >>> >>> Maxim Abalenkov wrote: >>> >>> >>> Dear all, >>> >>> I’m looking for guidance please. I would like to make sure, that I use all >>> eight of my CPU cores, when I run Python’s 3.9.9 NumPy on my macOS BigSur >>> 12.1. When I run my NumPy code, I see in ‘htop’, that only one ‘python’ >>> process is running and the core utilisation is 20–25%. I remember in the >>> past, stock MacPorts NumPy installation would use Apple’s Accelerate >>> library including the multithreaded BLAS and LAPACK ( >>> https://developer.apple.com/documentation/accelerate >>> ). As I understand this is no longer the case. >>> >>> I run Python code using a virtual environment located under >>> >>> /opt/venv/zipfstime/lib/python3.9/site-packages/numpy/core >>> >>> When I change there and issue >>> >>> otool -L _multiarray_umath.cpython-39-darwin.so >>> >>> _multiarray_umath.cpython-39-darwin.so: >>> @loader_path/../.dylibs/libopenblas.0.dylib (compatibility version >>> 0.0.0, current version 0.0.0) >>> /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version >>> 1281.100.1) >>> >>> In other words, NumPy relies on openBLAS. Command `port variants openblas` >>> returns >>> >>> OpenBLAS has the variants: >>> g95: Build using the g95 Fortran compiler >>> * conflicts with gcc10 gcc11 gcc8 gcc9 gccdevel >>> gcc10: Build using the MacPorts gcc 10 compiler >>> * conflicts with g95 g95 gcc11 gcc8 gcc9 gccdevel >>> [+]gcc11: Build using the MacPorts gcc 11 compiler >>> * conflicts with g95 g95 gcc10 gcc8 gcc9 gccdevel >>> gcc8: Build using the MacPorts gcc 8 compiler >>> * conflicts with g95 g95 gcc10 gcc11 gcc9 gccdevel >>> gcc9: Build using the MacPorts gcc 9 compiler >>> * conflicts with g95 g95 gcc10 gcc11 gcc8 gccdevel >>> gccdevel: Build using the MacPorts gcc devel compiler >>> * conflicts with g95 g95 gcc10 gcc11 gcc8 gcc9 >>> [+]lapack: Add Lapack/CLapack support to the library >>> native: Force compilation on machine to get fully optimized library >>> universal: Build for multiple architectures >>> >>> I tried installing the “native” variant of OpenBLAS port with `sudo port >>> install openblas +native` and setting the environment variable >>> `OMP_NUM_THREADS=8`, but I didn’t see any improvement when running my >>> Python code. I would welcome your help and guidance on this subject. >>> >> I'm using py39-numpy with default variants: >> >> % port installed py39-numpy openblas >> The following ports are currently installed: >> OpenBLAS @0.3.19_0+gcc11+lapack (active) >> py39-numpy @1.21.5_1+gfortran+openblas (active) >> >> I see Python using around 600% CPU on my 6-core machine when running this >> basic benchmark >> script:<https://gist.github.com/markus-beuckelmann/8bc25531b11158431a5b09a45abd6276> >> >> If you try that and see how many cores it uses, that will at least tell you >> if there is something different about your code. If it doesn't use all the >> cores for you, there are some other environment variables that OpenBLAS >> looks at that you could check: >> <https://github.com/xianyi/OpenBLAS#setting-the-number-of-threads-using-environment-variables> >> >> - Josh >> >