> > I find the example of sse rather enlightening: in theory, you should > expect a 100-300 % speed increase using sse, but even with pure C code > in a controlled manner, on one platform (linux + gcc), with varying, > recent CPU, the results are fundamentally different. So what would > happen in numpy, where you don't control things that much ? >
This means that what we measure is not what we think we measure. The time we get is not only dependent on the number of instructions. Did someone make a complete instrumented profile of the code that everyone is testing with callgrind or the Visual Studio profiler ? This will tell us excatly what is happening : - instructions - cache issues (that is likely to be the bottleneck, but without a proof, nothing should be done about it) - SSE efficiency - ... I think that to be really efficient, one would have to use a dynamic prefetcher, but these things are not available on x86 and even it were the case will never make it to the general public because they can't be proof tested (binary modifications on the fly). But they are really efficient when going through an array. Matthieu -- French PhD student Website : http://matthieu-brucher.developpez.com/ Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn : http://www.linkedin.com/in/matthieubrucher
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion