>
> I find the example of sse rather enlightening: in theory, you should
> expect a 100-300 % speed increase using sse, but even with pure C code
> in a controlled manner, on one platform (linux + gcc), with varying,
> recent CPU, the results are fundamentally different. So what would
> happen in numpy, where you don't control things that much ?
>

This means that what we measure is not what we think we measure. The time we
get is not only dependent on the number of instructions. Did someone make a
complete instrumented profile of the code that everyone is testing with
callgrind or the Visual Studio profiler ? This will tell us excatly what is
happening :
- instructions
- cache issues (that is likely to be the bottleneck, but without a proof,
nothing should be done about it)
- SSE efficiency
- ...

I think that to be really efficient, one would have to use a dynamic
prefetcher, but these things are not available on x86 and even it were the
case will never make it to the general public because they can't be proof
tested (binary modifications on the fly). But they are really efficient when
going through an array.

Matthieu
-- 
French PhD student
Website : http://matthieu-brucher.developpez.com/
Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92
LinkedIn : http://www.linkedin.com/in/matthieubrucher
_______________________________________________
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to