Re: [Numpy-discussion] Why is numpy.abs so much slower on complex64 than complex128 under windows 32-bit?

Francesc Alted Tue, 10 Apr 2012 08:37:07 -0700

On 4/10/12 6:44 AM, Henry Gomersall wrote:

Here is the body of a post I made on stackoverflow, but it seems to bea non-obvious issue. I was hoping someone here might be able to shedlight on it...
On my 32-bit Windows Vista machine I notice a significant (5x)slowdown when taking the absolute values of a fairly large|numpy.complex64| array when compared to a |numpy.complex128| array.
|>>>  import  numpy
>>>  a=  numpy.random.randn(256,2048)  +  1j*numpy.random.randn(256,2048)
>>>  b=  numpy.complex64(a)
>>>  timeit c=  numpy.float32(numpy.abs(a))
10  loops,  best of3:  27.5  ms per loop
>>>  timeit c=  numpy.abs(b)
1  loops,  best of3:  143  ms per loop
|
Obviously, the outputs in both cases are the same (to operatingprecision).
I do not notice the same effect on my Ubuntu 64-bit machine (indeed,as one might expect, the double precision array operation is a bitslower).
Is there a rational explanation for this?

Is this something that is common to all windows?

I cannot tell for sure, but it looks like the windows version of NumPyis casting complex64 to complex128 internally. I'm guessing here, butnumexpr lacks the complex64 type, so it has to internally do the upcast,and I'm seeing kind of the same slowdown:


In [6]: timeit numpy.abs(a)
100 loops, best of 3: 10.7 ms per loop

In [7]: timeit numpy.abs(b)
100 loops, best of 3: 8.51 ms per loop

In [8]: timeit numexpr.evaluate("abs(a)")
100 loops, best of 3: 1.67 ms per loop

In [9]: timeit numexpr.evaluate("abs(b)")
100 loops, best of 3: 4.96 ms per loop

In my case I'm seeing only a 3x slowdown, but this is because numexpr isnot re-casting the outcome to complex64, while windows might be doingthis. Just to make sure, can you run this:


In [10]: timeit c = numpy.complex64(numpy.abs(numpy.complex128(b)))
100 loops, best of 3: 12.3 ms per loop

In [11]: timeit c = numpy.abs(b)
100 loops, best of 3: 8.45 ms per loop

in your windows box and see if they raise similar results?

In a related note of confusion, the times above are notably (andconsistently) different (shorter) to that I get doing a naive `st =time.time(); numpy.abs(a); print time.time()-st`. Is this to be expected?

This happens a lot, yes, specially when your code is memory-bottlenecked(a very common situation). The explanation is simple: when yourdatasets are small enough to fit in CPU cache, the first time the timingloop runs, it brings all your working set to cache, so the second timethe computation is evaluated, the time does not have to fetch data frommemory, and by the time you run the loop 10 times or more, you arediscarding any memory effect. However, when you run the loop only once,you are considering the memory fetch time too (which is often much morerealistic).


--
Francesc Alted

_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Why is numpy.abs so much slower on complex64 than complex128 under windows 32-bit?

Reply via email to