Works much, much better with the current svn version. :) Numexpr now
outperforms everything except the "simple" technique, and then only
for small data sets.

Along the lines you mentioned I noticed that simply changing from a
shape of (100*100*100,) to (100, 100, 100) results in nearly a factor
of 2 worse performance, a factor which seems constant when changing
the size of the data set.  Is this related to the way numexpr handles
broadcasting rules?  It would seem the memory contents should be
identical for these two cases.

Andrew

On Tue, Jan 20, 2009 at 6:13 AM, Francesc Alted <fal...@pytables.org> wrote:
> A Tuesday 20 January 2009, Andrew Collette escrigué:
>> Hi Francesc,
>>
>> Looks like a cool project!  However, I'm not able to achieve the
>> advertised speed-ups.  I wrote a simple script to try three
>> approaches to this kind of problem:
>>
>> 1) Native Python code (i.e. will try to do everything at once using
>> temp arrays) 2) Straightforward numexpr evaluation
>> 3) Simple "chunked" evaluation using array.flat views.  (This solves
>> the memory problem and allows the use of arbitrary Python
>> expressions).
>>
>> I've attached the script; here's the output for the expression
>> "63 + (a*b) + (c**2) + sin(b)"
>> along with a few combinations of shapes/dtypes.  As expected, using
>> anything other than "f8" (double) results in a performance penalty.
>> Surprisingly, it seems that using chunks via array.flat results in
>> similar performance for f8, and even better performance for other
>> dtypes.
> [clip]
>
> Well, there were two issues there.  The first one is that when
> transcendental functions are used (like sin() above), the bottleneck is
> on the CPU instead of memory bandwidth, so numexpr speedups are not so
> high as usual.  The other issue was an actual bug in the numexpr code
> that forced a copy of all multidimensional arrays (I normally only use
> undimensional arrays for doing benchmarks).  This has been fixed in
> trunk (r39).
>
> So, with the fix on, the timings are:
>
> (100, 100, 100) f4 (average of 10 runs)
> Simple:  0.0426136016846
> Numexpr:  0.11350851059
> Chunked:  0.0635252952576
> (100, 100, 100) f8 (average of 10 runs)
> Simple:  0.119254398346
> Numexpr:  0.10092959404
> Chunked:  0.128384995461
>
> The speed-up is now a mere 20% (for f8), but at least it is not slower.
> With the patches that recently contributed Georg for using Intel's VML,
> the acceleration is a bit better:
>
> (100, 100, 100) f4 (average of 10 runs)
> Simple:  0.0417867898941
> Numexpr:  0.0944641113281
> Chunked:  0.0636183023453
> (100, 100, 100) f8 (average of 10 runs)
> Simple:  0.120059680939
> Numexpr:  0.0832288980484
> Chunked:  0.128114104271
>
> i.e. the speed-up is around 45% (for f8).
>
> Moreover, if I get rid of the sin() function and use the expresion:
>
> "63 + (a*b) + (c**2) + b"
>
> I get:
>
> (100, 100, 100) f4 (average of 10 runs)
> Simple:  0.0119329929352
> Numexpr:  0.0198570966721
> Chunked:  0.0338240146637
> (100, 100, 100) f8 (average of 10 runs)
> Simple:  0.0255623102188
> Numexpr:  0.00832500457764
> Chunked:  0.0340095996857
>
> which has a 3.1x speedup (for f8).
>
>> FYI, the current tar file (1.1-1) has a glitch related to the VERSION
>> file; I added to the bug report at google code.
>
> Thanks. Will focus on that asap.  Mmm, seems like there is stuff enough
> for another release of numexpr.  I'll try to do it soon.
>
> Cheers,
>
> --
> Francesc Alted
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>
_______________________________________________
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to