A Wednesday 22 December 2010 01:53:55 Mark Wiebe escrigué: > Hello NumPy-ers, > > After some performance analysis, I've designed and implemented a new > iterator designed to speed up ufuncs and allow for easier > multi-dimensional iteration. The new code is fairly large, but > works quite well already. If some people could read the NEP and > give some feedback, that would be great! Here's a link: > > https://github.com/m-paradox/numpy/blob/mw_neps/doc/neps/new-iterator > -ufunc.rst > > I would also love it if someone could try building the code and play > around with it a bit. The github branch is here: > > https://github.com/m-paradox/numpy/tree/new_iterator > > To give a taste of the iterator's functionality, below is an example > from the NEP for how to implement a "Lambda UFunc." With just a few > lines of code, it's possible to replicate something similar to the > numexpr library (numexpr still gets a bigger speedup, though). In > the example expression I chose, execution time went from 138ms to > 61ms. > > Hopefully this is a good Christmas present for NumPy. :) > > Cheers, > Mark > > Here is the definition of the ``luf`` function.:: > > def luf(lamdaexpr, *args, **kwargs): > """Lambda UFunc > > e.g. > c = luf(lambda i,j:i+j, a, b, order='K', > casting='safe', buffersize=8192) > > c = np.empty(...) > luf(lambda i,j:i+j, a, b, out=c, order='K', > casting='safe', buffersize=8192) > """ > > nargs = len(args) > op = args + (kwargs.get('out',None),) > it = np.newiter(op, ['buffered','no_inner_iteration'], > [['readonly','nbo_aligned']]*nargs + > > [['writeonly','allocate','no_broadcast']], > order=kwargs.get('order','K'), > casting=kwargs.get('casting','safe'), > buffersize=kwargs.get('buffersize',0)) > while not it.finished: > it[-1] = lamdaexpr(*it[:-1]) > it.iternext() > > return it.operands[-1] > > Then, by using ``luf`` instead of straight Python expressions, we > can gain some performance from better cache behavior.:: > > In [2]: a = np.random.random((50,50,50,10)) > In [3]: b = np.random.random((50,50,1,10)) > In [4]: c = np.random.random((50,50,50,1)) > > In [5]: timeit 3*a+b-(a/c) > 1 loops, best of 3: 138 ms per loop > > In [6]: timeit luf(lambda a,b,c:3*a+b-(a/c), a, b, c) > 10 loops, best of 3: 60.9 ms per loop > > In [7]: np.all(3*a+b-(a/c) == luf(lambda a,b,c:3*a+b-(a/c), a, b, > c)) Out[7]: True
Wow, really nice work! It would be great if that could make into NumPy :-) Regarding your comment on numexpr being faster, I'm not sure (your new_iterator branch does not work for me; it gives me an error like: AttributeError: 'module' object has no attribute 'newiter'), but my guess is that your approach seems actually faster: >>> a = np.random.random((50,50,50,10)) >>> b = np.random.random((50,50,1,10)) >>> c = np.random.random((50,50,50,1)) >>> timeit 3*a+b-(a/c) 10 loops, best of 3: 67.5 ms per loop >>> import numexpr as ne >>> ne.evaluate("3*a+b-(a/c) >>> timeit ne.evaluate("3*a+b-(a/c)") 10 loops, best of 3: 42.8 ms per loop i.e. numexpr is not able to achieve the 2x speedup mark that you are getting with ``luf`` (using a Core2 @ 3 GHz here). -- Francesc Alted _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion