A Sunday 23 March 2008, Charles R Harris escrigué: > gcc --version: gcc (GCC) 4.1.2 20070925 (Red Hat 4.1.2-33) > cpu: Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz > > Problem size Simple Intrin > Inline > 100 0.0002ms (100.0%) 0.0001ms ( 68.7%) > 0.0001ms ( 74.8%) > 1000 0.0015ms (100.0%) 0.0011ms ( 72.0%) > 0.0012ms ( 80.4%) > 10000 0.0154ms (100.0%) 0.0111ms ( 72.1%) > 0.0122ms ( 79.1%) > 100000 0.1081ms (100.0%) 0.0759ms ( 70.2%) > 0.0811ms ( 75.0%) > 1000000 2.7778ms (100.0%) 2.8172ms (101.4%) > 2.7929ms ( 100.5%) > 10000000 28.1577ms (100.0%) 28.7332ms (102.0%) > 28.4669ms ( 101.1%)
I'm mystified about your machine requiring just 28s for completing the 10 million test, and most of the other, similar processors (some faster than yours), in this thread falls pretty far from your figure. What sort of memory subsystem are you using? > It looks like memory access is the bottleneck, otherwise running 4 > floats through in parallel should go a lot faster. Yes, that's probably right. This test is mainly measuring the memory access speed of machines for large datasets. For small ones, my guess is that the data is directly placed in caches, so there is no need to transport them to the CPU prior to do the calculations. However, I'm not sure whether this kind of optimizations for small datasets would be very useful in practice (read general NumPy calculations), but I'm rather sceptical about this. Cheers, -- >0,0< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data "-" _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion