On Mon, Feb 13, 2012 at 7:30 PM, Nathaniel Smith <n...@pobox.com> wrote: > How would you fix it? I shouldn't speculate without profiling, but I'll be > naughty. Presumably the problem is that python turns that into something > like > > hist[i,j] = hist[i,j] + 1 > > which means there's no way for numpy to avoid creating a temporary array. So > maybe this could be fixed by adding a fused __inplace_add__ protocol to the > language (and similarly for all the other inplace operators), but that seems > really unlikely. Fundamentally this is just the sort of optimization > opportunity you miss when you don't have a compiler with a global view; > Fortran or c++ expression templates will win every time. Maybe pypy will fix > it someday. > > Perhaps it would help to make np.add(hist, 1, out=hist, where=(i,j)) work? > > - N
Nope, don't buy it: In [33]: timeit arr.__iadd__(1) 1000 loops, best of 3: 1.13 ms per loop In [37]: timeit arr[:] += 1 1000 loops, best of 3: 1.13 ms per loop - Wes > On Feb 14, 2012 12:18 AM, "Wes McKinney" <wesmck...@gmail.com> wrote: >> >> On Mon, Feb 13, 2012 at 6:23 PM, Marcel Oliver >> <m.oli...@jacobs-university.de> wrote: >> > Hi, >> > >> > I have a short piece of code where the use of an index array "feels >> > right", but incurs a severe performance penalty: It's about an order >> > of magnitude slower than all other operations with arrays of that >> > size. >> > >> > It comes up in a piece of code which is doing a large number of "on >> > the fly" histograms via >> > >> > hist[i,j] += 1 >> > >> > where i is an array with the bin index to be incremented and j is >> > simply enumerating the histograms. I attach a full short sample code >> > below which shows how it's being used in context, and corresponding >> > timeit output from the critical code section. >> > >> > Questions: >> > >> > - Is this a matter of principle, or due to an inefficient >> > implementation? >> > - Is there an equivalent way of doing it which is fast? >> > >> > Regards, >> > Marcel >> > >> > ================================================================= >> > >> > #! /usr/bin/env python >> > # Plot the bifurcation diagram of the logistic map >> > >> > from pylab import * >> > >> > Nx = 800 >> > Ny = 600 >> > I = 50000 >> > >> > rmin = 2.5 >> > rmax = 4.0 >> > ymin = 0.0 >> > ymax = 1.0 >> > >> > rr = linspace (rmin, rmax, Nx) >> > x = 0.5*ones(rr.shape) >> > hist = zeros((Ny+1,Nx), dtype=int) >> > j = arange(Nx) >> > >> > dy = ymax/Ny >> > >> > def f(x): >> > return rr*x*(1.0-x) >> > >> > for n in xrange(1000): >> > x = f(x) >> > >> > for n in xrange(I): >> > x = f(x) >> > i = array(x/dy, dtype=int) >> > hist[i,j] += 1 >> > >> > figure() >> > >> > imshow(hist, >> > cmap='binary', >> > origin='lower', >> > interpolation='nearest', >> > extent=(rmin,rmax,ymin,ymax), >> > norm=matplotlib.colors.LogNorm()) >> > >> > xlabel ('$r$') >> > ylabel ('$x$') >> > >> > title('Attractor of the logistic map $x_{n+1} = r \, x_n (1-x_n)$') >> > >> > show() >> > >> > ==================================================================== >> > >> > In [4]: timeit y=f(x) >> > 10000 loops, best of 3: 19.4 us per loop >> > >> > In [5]: timeit i = array(x/dy, dtype=int) >> > 10000 loops, best of 3: 22 us per loop >> > >> > In [6]: timeit img[i,j] += 1 >> > 10000 loops, best of 3: 119 us per loop >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion@scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> This suggests to me that fancy indexing could be quite a bit faster in >> this case: >> >> In [40]: timeit hist[i,j] += 110000 loops, best of 3: 58.2 us per loop >> In [39]: timeit hist.put(np.ravel_multi_index((i, j), hist.shape), 1) >> 10000 loops, best of 3: 20.6 us per loop >> >> I wrote a simple Cython method >> >> def fancy_inc(ndarray[int64_t, ndim=2] values, >> ndarray[int64_t] iarr, ndarray[int64_t] jarr, int64_t inc): >> cdef: >> Py_ssize_t i, n = len(iarr) >> >> for i in range(n): >> values[iarr[i], jarr[i]] += inc >> >> that does even faster >> >> In [8]: timeit sbx.fancy_inc(hist, i, j, 1) >> 100000 loops, best of 3: 4.85 us per loop >> >> About 10% faster if bounds checking and wraparound are disabled. >> >> Kind of a bummer-- perhaps this should go high on the NumPy 2.0 TODO list? >> >> - Wes >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion