A Sunday 23 March 2008, Charles R Harris escrigué:
> gcc --version: gcc (GCC) 4.1.2 20070925 (Red Hat 4.1.2-33)
> cpu:  Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz
>
>         Problem size              Simple              Intrin
> Inline
>                  100   0.0002ms (100.0%)   0.0001ms ( 68.7%)  
> 0.0001ms ( 74.8%)
>                 1000   0.0015ms (100.0%)   0.0011ms ( 72.0%)  
> 0.0012ms ( 80.4%)
>                10000   0.0154ms (100.0%)   0.0111ms ( 72.1%)  
> 0.0122ms ( 79.1%)
>               100000   0.1081ms (100.0%)   0.0759ms ( 70.2%)  
> 0.0811ms ( 75.0%)
>              1000000   2.7778ms (100.0%)   2.8172ms (101.4%)  
> 2.7929ms ( 100.5%)
>             10000000  28.1577ms (100.0%)  28.7332ms (102.0%) 
> 28.4669ms ( 101.1%)

I'm mystified about your machine requiring just 28s for completing the 
10 million test, and most of the other, similar processors (some faster 
than yours), in this thread falls pretty far from your figure.  What 
sort of memory subsystem are you using?

> It looks like memory access is the bottleneck, otherwise running 4
> floats through in parallel should go a lot faster.

Yes, that's probably right.  This test is mainly measuring the memory 
access speed of machines for large datasets.  For small ones, my guess 
is that the data is directly placed in caches, so there is no need to 
transport them to the CPU prior to do the calculations.  However, I'm 
not sure whether this kind of optimizations for small datasets would be 
very useful in practice (read general NumPy calculations), but I'm 
rather sceptical about this.

Cheers,

-- 
>0,0<   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 "-"
_______________________________________________
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to