Hi Francesc, Looks like a cool project! However, I'm not able to achieve the advertised speed-ups. I wrote a simple script to try three approaches to this kind of problem:
1) Native Python code (i.e. will try to do everything at once using temp arrays) 2) Straightforward numexpr evaluation 3) Simple "chunked" evaluation using array.flat views. (This solves the memory problem and allows the use of arbitrary Python expressions). I've attached the script; here's the output for the expression "63 + (a*b) + (c**2) + sin(b)" along with a few combinations of shapes/dtypes. As expected, using anything other than "f8" (double) results in a performance penalty. Surprisingly, it seems that using chunks via array.flat results in similar performance for f8, and even better performance for other dtypes. (100, 100, 100) f4 (average of 10 runs) Simple: 0.155238199234 Numexpr: 0.278440499306 Chunked: 0.166213512421 (100, 100, 100) f8 (average of 10 runs) Simple: 0.241649699211 Numexpr: 0.192837905884 Chunked: 0.183888602257 (100, 100, 100, 10) f4 (average of 10 runs) Simple: 1.56741549969 Numexpr: 3.40679829121 Chunked: 1.83729870319 (100, 100, 100) i4 (average of 10 runs) Simple: 0.206279683113 Numexpr: 0.210431909561 Chunked: 0.182894086838 FYI, the current tar file (1.1-1) has a glitch related to the VERSION file; I added to the bug report at google code. Andrew Collette On Fri, Jan 16, 2009 at 4:00 AM, Francesc Alted <fal...@pytables.org> wrote: > ======================== > Announcing Numexpr 1.1 > ======================== > > Numexpr is a fast numerical expression evaluator for NumPy. With it, > expressions that operate on arrays (like "3*a+4*b") are accelerated > and use less memory than doing the same calculation in Python. > > The expected speed-ups for Numexpr respect to NumPy are between 0.95x > and 15x, being 3x or 4x typical values. The strided and unaligned > case has been optimized too, so if the expresion contains such arrays, > the speed-up can increase significantly. Of course, you will need to > operate with large arrays (typically larger than the cache size of your > CPU) to see these improvements in performance. > > This release is mainly intended to put in sync some of the > improvements that had the Numexpr version integrated in PyTables. > So, this standalone version of Numexpr will benefit of the well tested > PyTables' version that has been in production for more than a year now. > > In case you want to know more in detail what has changed in this > version, have a look at ``RELEASE_NOTES.txt`` in the tarball. > > > Where I can find Numexpr? > ========================= > > The project is hosted at Google code in: > > http://code.google.com/p/numexpr/ > > > Share your experience > ===================== > > Let us know of any bugs, suggestions, gripes, kudos, etc. you may > have. > > > Enjoy! > > -- > Francesc Alted > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion@scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > >
import numpy as np import numexpr as nx import time test_shape = (100,100,100) # All 3 arrays have this shape test_dtype = 'i4' nruns = 10 # Ensemble for timing test_size = np.product(test_shape) def chunkify(chunksize): """ Very stupid "chunk vectorizer" which keeps memory use down. This version requires all inputs to have the same number of elements, although it shouldn't be that hard to implement simple broadcasting. """ def chunkifier(func): def wrap(*args): assert len(args) > 0 assert all(len(a.flat) == len(args[0].flat) for a in args) nelements = len(args[0].flat) nchunks, remain = divmod(nelements, chunksize) out = np.ndarray(args[0].shape) for start in xrange(0, nelements, chunksize): #print start stop = start+chunksize if start+chunksize > nelements: stop = nelements-start iargs = tuple(a.flat[start:stop] for a in args) out.flat[start:stop] = func(*iargs) return out return wrap return chunkifier test_func_str = "63 + (a*b) + (c**2) + sin(b)" def test_func(a, b, c): return 63 + (a*b) + (c**2) + np.sin(b) test_func_chunked = chunkify(100*100)(test_func) # The actual data we'll use a = np.arange(test_size, dtype=test_dtype).reshape(test_shape) b = np.arange(test_size, dtype=test_dtype).reshape(test_shape) c = np.arange(test_size, dtype=test_dtype).reshape(test_shape) start1 = time.time() for idx in xrange(nruns): result1 = test_func(a, b, c) stop1 = time.time() start2 = time.time() for idx in xrange(nruns): result2 = nx.evaluate(test_func_str) stop2 = time.time() start3 = time.time() for idx in xrange(nruns): result3 = test_func_chunked(a, b, c) stop3 = time.time() print "%s %s (average of %s runs)" % (test_shape, test_dtype, nruns) print "Simple: ", (stop1-start1)/nruns print "Numexpr: ", (stop2-start2)/nruns print "Chunked: ", (stop3-start3)/nruns
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion