See my earlier email - this is fedora 33, python3.9. I'm using fedora 33 standard numpy. ldd says:
/usr/lib64/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39-x86_64-linux-gnu.so: linux-vdso.so.1 (0x00007ffdd1487000) libflexiblas.so.3 => /lib64/libflexiblas.so.3 (0x00007f0512787000) So whatever flexiblas is doing controls blas. flexiblas print FlexiBLAS, version 3.0.4 Copyright (C) 2014, 2015, 2016, 2017, 2018, 2019, 2020 Martin Koehler and others. This is free software; see the source code for copying conditions. There is ABSOLUTELY NO WARRANTY; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Configured BLAS libraries: System-wide (/etc/flexiblasrc): System-wide from config directory (/etc/flexiblasrc.d/) OPENBLAS-OPENMP library = libflexiblas_openblas-openmp.so comment = NETLIB library = libflexiblas_netlib.so comment = ATLAS library = libflexiblas_atlas.so comment = User config (/home/nbecker/.flexiblasrc): Host config (/home/nbecker/.flexiblasrc.nbecker8): Available hooks: Backend and hook search paths: /usr/lib64/flexiblas/ Default BLAS: System: OPENBLAS-OPENMP User: (none) Host: (none) Active Default: OPENBLAS-OPENMP (System) Runtime properties: verbose = 0 (System) So it looks to me it is using openblas-openmp. On Tue, Feb 23, 2021 at 8:00 PM Charles R Harris <charlesr.har...@gmail.com> wrote: > > > > On Tue, Feb 23, 2021 at 5:47 PM Charles R Harris <charlesr.har...@gmail.com> > wrote: >> >> >> >> On Tue, Feb 23, 2021 at 11:10 AM Neal Becker <ndbeck...@gmail.com> wrote: >>> >>> I have code that performs dot product of a 2D matrix of size (on the >>> order of) [1000,16] with a vector of size [1000]. The matrix is >>> float64 and the vector is complex128. I was using numpy.dot but it >>> turned out to be a bottleneck. >>> >>> So I coded dot2x1 in c++ (using xtensor-python just for the >>> interface). No fancy simd was used, unless g++ did it on it's own. >>> >>> On a simple benchmark using timeit I find my hand-coded routine is on >>> the order of 1000x faster than numpy? Here is the test code: >>> My custom c++ code is dot2x1. I'm not copying it here because it has >>> some dependencies. Any idea what is going on? >>> >>> import numpy as np >>> >>> from dot2x1 import dot2x1 >>> >>> a = np.ones ((1000,16)) >>> b = np.array([ 0.80311816+0.80311816j, 0.80311816-0.80311816j, >>> -0.80311816+0.80311816j, -0.80311816-0.80311816j, >>> 1.09707981+0.29396165j, 1.09707981-0.29396165j, >>> -1.09707981+0.29396165j, -1.09707981-0.29396165j, >>> 0.29396165+1.09707981j, 0.29396165-1.09707981j, >>> -0.29396165+1.09707981j, -0.29396165-1.09707981j, >>> 0.25495815+0.25495815j, 0.25495815-0.25495815j, >>> -0.25495815+0.25495815j, -0.25495815-0.25495815j]) >>> >>> def F1(): >>> d = dot2x1 (a, b) >>> >>> def F2(): >>> d = np.dot (a, b) >>> >>> from timeit import timeit >>> print (timeit ('F1()', globals=globals(), number=1000)) >>> print (timeit ('F2()', globals=globals(), number=1000)) >>> >>> In [13]: 0.013910860987380147 << 1st timeit >>> 28.608758996007964 << 2nd timeit >> >> >> I'm going to guess threading, although huge pages can also be a problem on a >> machine under heavy load running other processes. Call overhead may also >> matter for such small matrices. >> > > What BLAS library are you using. I get much better results using an 8 year > old i5 and ATLAS. > > Chuck > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -- Those who don't understand recursion are doomed to repeat it _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion