Hi Francesc,

Looks like a cool project!  However, I'm not able to achieve the
advertised speed-ups.  I wrote a simple script to try three approaches
to this kind of problem:

1) Native Python code (i.e. will try to do everything at once using temp arrays)
2) Straightforward numexpr evaluation
3) Simple "chunked" evaluation using array.flat views.  (This solves
the memory problem and allows the use of arbitrary Python
expressions).

I've attached the script; here's the output for the expression
"63 + (a*b) + (c**2) + sin(b)"
along with a few combinations of shapes/dtypes.  As expected, using
anything other than "f8" (double) results in a performance penalty.
Surprisingly, it seems that using chunks via array.flat results in
similar performance for f8, and even better performance for other
dtypes.

(100, 100, 100) f4 (average of 10 runs)
Simple:  0.155238199234
Numexpr:  0.278440499306
Chunked:  0.166213512421

(100, 100, 100) f8 (average of 10 runs)
Simple:  0.241649699211
Numexpr:  0.192837905884
Chunked:  0.183888602257

(100, 100, 100, 10) f4 (average of 10 runs)
Simple:  1.56741549969
Numexpr:  3.40679829121
Chunked:  1.83729870319

(100, 100, 100) i4 (average of 10 runs)
Simple:  0.206279683113
Numexpr:  0.210431909561
Chunked:  0.182894086838

FYI, the current tar file (1.1-1) has a glitch related to the VERSION
file; I added to the bug report at google code.

Andrew Collette

On Fri, Jan 16, 2009 at 4:00 AM, Francesc Alted <fal...@pytables.org> wrote:
> ========================
>  Announcing Numexpr 1.1
> ========================
>
> Numexpr is a fast numerical expression evaluator for NumPy.  With it,
> expressions that operate on arrays (like "3*a+4*b") are accelerated
> and use less memory than doing the same calculation in Python.
>
> The expected speed-ups for Numexpr respect to NumPy are between 0.95x
> and 15x, being 3x or 4x typical values.  The strided and unaligned
> case has been optimized too, so if the expresion contains such arrays,
> the speed-up can increase significantly.  Of course, you will need to
> operate with large arrays (typically larger than the cache size of your
> CPU) to see these improvements in performance.
>
> This release is mainly intended to put in sync some of the
> improvements that had the Numexpr version integrated in PyTables.
> So, this standalone version of Numexpr will benefit of the well tested
> PyTables' version that has been in production for more than a year now.
>
> In case you want to know more in detail what has changed in this
> version, have a look at ``RELEASE_NOTES.txt`` in the tarball.
>
>
> Where I can find Numexpr?
> =========================
>
> The project is hosted at Google code in:
>
> http://code.google.com/p/numexpr/
>
>
> Share your experience
> =====================
>
> Let us know of any bugs, suggestions, gripes, kudos, etc. you may
> have.
>
>
> Enjoy!
>
> --
> Francesc Alted
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>
>
import numpy as np
import numexpr as nx
import time

test_shape = (100,100,100)   # All 3 arrays have this shape
test_dtype = 'i4'
nruns = 10                   # Ensemble for timing

test_size = np.product(test_shape)

def chunkify(chunksize):
    """ Very stupid "chunk vectorizer" which keeps memory use down.
        This version requires all inputs to have the same number of elements,
        although it shouldn't be that hard to implement simple broadcasting.
    """

    def chunkifier(func):

        def wrap(*args):

            assert len(args) > 0
            assert all(len(a.flat) == len(args[0].flat) for a in args)

            nelements = len(args[0].flat)
            nchunks, remain = divmod(nelements, chunksize)

            out = np.ndarray(args[0].shape)

            for start in xrange(0, nelements, chunksize):
                #print start
                stop = start+chunksize
                if start+chunksize > nelements:
                    stop = nelements-start
                iargs = tuple(a.flat[start:stop] for a in args)
                out.flat[start:stop] = func(*iargs)
            return out

        return wrap

    return chunkifier

test_func_str = "63 + (a*b) + (c**2) + sin(b)"

def test_func(a, b, c):
    return 63 + (a*b) + (c**2) + np.sin(b)

test_func_chunked = chunkify(100*100)(test_func)

# The actual data we'll use
a = np.arange(test_size, dtype=test_dtype).reshape(test_shape)
b = np.arange(test_size, dtype=test_dtype).reshape(test_shape)
c = np.arange(test_size, dtype=test_dtype).reshape(test_shape)


start1 = time.time()
for idx in xrange(nruns):
    result1 = test_func(a, b, c)
stop1 = time.time()

start2 = time.time()
for idx in xrange(nruns):
    result2 = nx.evaluate(test_func_str)
stop2 = time.time()

start3 = time.time()
for idx in xrange(nruns):
    result3 = test_func_chunked(a, b, c)
stop3 = time.time()

print "%s %s (average of %s runs)" % (test_shape, test_dtype, nruns)
print "Simple: ", (stop1-start1)/nruns
print "Numexpr: ", (stop2-start2)/nruns
print "Chunked: ", (stop3-start3)/nruns


_______________________________________________
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to