One suspect is that it seems the numpy version was multi-threading. This isn't useful here, because I'm running parallel monte-carlo simulations using all cores. Perhaps this is perversely slowing things down? I don't know how to account for 1000x slowdown though.
On Tue, Feb 23, 2021 at 1:40 PM Roman Yurchak <rth.yurc...@gmail.com> wrote: > > For the first benchmark apparently A.dot(B) with A real and B complex is > a known issue performance wise https://github.com/numpy/numpy/issues/10468 > > In general, it might be worth trying different BLAS backends. For > instance, if you install numpy from conda-forge you should be able to > switch between OpenBLAS, MKL and BLIS: > https://conda-forge.org/docs/maintainer/knowledge_base.html#switching-blas-implementation > > Roman > > On 23/02/2021 19:32, Andrea Gavana wrote: > > Hi, > > > > On Tue, 23 Feb 2021 at 19.11, Neal Becker <ndbeck...@gmail.com > > <mailto:ndbeck...@gmail.com>> wrote: > > > > I have code that performs dot product of a 2D matrix of size (on the > > order of) [1000,16] with a vector of size [1000]. The matrix is > > float64 and the vector is complex128. I was using numpy.dot but it > > turned out to be a bottleneck. > > > > So I coded dot2x1 in c++ (using xtensor-python just for the > > interface). No fancy simd was used, unless g++ did it on it's own. > > > > On a simple benchmark using timeit I find my hand-coded routine is on > > the order of 1000x faster than numpy? Here is the test code: > > My custom c++ code is dot2x1. I'm not copying it here because it has > > some dependencies. Any idea what is going on? > > > > > > > > I had a similar experience - albeit with an older numpy and Python 2.7, > > so my comments are easily outdated and irrelevant. This was on Windows > > 10 64 bit, way more than plenty RAM. > > > > It took me forever to find out that numpy.dot was the culprit, and I > > ended up using fortran + f2py. Even with the overhead of having to go > > through f2py bridge, the fortran dot_product was several times faster. > > > > Sorry if It doesn’t help much. > > > > Andrea. > > > > > > > > > > import numpy as np > > > > from dot2x1 import dot2x1 > > > > a = np.ones ((1000,16)) > > b = np.array([ 0.80311816+0.80311816j, 0.80311816-0.80311816j, > > -0.80311816+0.80311816j, -0.80311816-0.80311816j, > > 1.09707981+0.29396165j, 1.09707981-0.29396165j, > > -1.09707981+0.29396165j, -1.09707981-0.29396165j, > > 0.29396165+1.09707981j, 0.29396165-1.09707981j, > > -0.29396165+1.09707981j, -0.29396165-1.09707981j, > > 0.25495815+0.25495815j, 0.25495815-0.25495815j, > > -0.25495815+0.25495815j, -0.25495815-0.25495815j]) > > > > def F1(): > > d = dot2x1 (a, b) > > > > def F2(): > > d = np.dot (a, b) > > > > from timeit import timeit > > print (timeit ('F1()', globals=globals(), number=1000)) > > print (timeit ('F2()', globals=globals(), number=1000)) > > > > In [13]: 0.013910860987380147 << 1st timeit > > 28.608758996007964 << 2nd timeit > > -- > > Those who don't understand recursion are doomed to repeat it > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org> > > https://mail.python.org/mailman/listinfo/numpy-discussion > > <https://mail.python.org/mailman/listinfo/numpy-discussion> > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -- Those who don't understand recursion are doomed to repeat it _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion