On 4/10/12 6:44 AM, Henry Gomersall wrote:
Here is the body of a post I made on stackoverflow, but it seems to be a non-obvious issue. I was hoping someone here might be able to shed light on it...

On my 32-bit Windows Vista machine I notice a significant (5x) slowdown when taking the absolute values of a fairly large |numpy.complex64| array when compared to a |numpy.complex128| array.

|>>>  import  numpy
>>>  a=  numpy.random.randn(256,2048)  +  1j*numpy.random.randn(256,2048)
>>>  b=  numpy.complex64(a)
>>>  timeit c=  numpy.float32(numpy.abs(a))
10  loops,  best of3:  27.5  ms per loop
>>>  timeit c=  numpy.abs(b)
1  loops,  best of3:  143  ms per loop
|

Obviously, the outputs in both cases are the same (to operating precision).

I do not notice the same effect on my Ubuntu 64-bit machine (indeed, as one might expect, the double precision array operation is a bit slower).

Is there a rational explanation for this?

Is this something that is common to all windows?


I cannot tell for sure, but it looks like the windows version of NumPy is casting complex64 to complex128 internally. I'm guessing here, but numexpr lacks the complex64 type, so it has to internally do the upcast, and I'm seeing kind of the same slowdown:

In [6]: timeit numpy.abs(a)
100 loops, best of 3: 10.7 ms per loop

In [7]: timeit numpy.abs(b)
100 loops, best of 3: 8.51 ms per loop

In [8]: timeit numexpr.evaluate("abs(a)")
100 loops, best of 3: 1.67 ms per loop

In [9]: timeit numexpr.evaluate("abs(b)")
100 loops, best of 3: 4.96 ms per loop

In my case I'm seeing only a 3x slowdown, but this is because numexpr is not re-casting the outcome to complex64, while windows might be doing this. Just to make sure, can you run this:

In [10]: timeit c = numpy.complex64(numpy.abs(numpy.complex128(b)))
100 loops, best of 3: 12.3 ms per loop

In [11]: timeit c = numpy.abs(b)
100 loops, best of 3: 8.45 ms per loop

in your windows box and see if they raise similar results?

In a related note of confusion, the times above are notably (and consistently) different (shorter) to that I get doing a naive `st = time.time(); numpy.abs(a); print time.time()-st`. Is this to be expected?


This happens a lot, yes, specially when your code is memory-bottlenecked (a very common situation). The explanation is simple: when your datasets are small enough to fit in CPU cache, the first time the timing loop runs, it brings all your working set to cache, so the second time the computation is evaluated, the time does not have to fetch data from memory, and by the time you run the loop 10 times or more, you are discarding any memory effect. However, when you run the loop only once, you are considering the memory fetch time too (which is often much more realistic).

--
Francesc Alted

_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to