Hello,
I'm trying to understand how basic functions on vectors perform, regarding
the size (n) of a vector.
When I track the elapsed time per element (in nano sec) regarding the
vector size (n), I see that I need at least 100 elements in my vector to
reach half the maximum speed (at 7ns/el for n=1e3).
So my question is what am I measuring between n=1 to n= 100 and why the
performance is drastically poorer in this region ?
Is this the cost of calling the function ?
Is this a problem with my profiling method ?
Thanks,
Lionel
CPU(1) = 300 ns/el
CPU(10) = 40ns/el
CPU(100) = 12ns/el
CPU(200) = 10ns/el
CPU(1_000) = 7ns/el = max speed
using Gadfly
N = [ 1,2,3,4,5,6,7,8,9,
10,20,30,40,50,60,70,80,90,100,200,300,400,500,750,
1_000,2_500,5_000,7_500,10_000,100_000,1_000_000]
cpu = []
for n in N
n==1 ? a = pi : a = rand(n)
sqrt(a)
gc()
gc_enable(false)
t = mean([@elapsed sqrt(a) for i=1:100])*(1e9/n)
gc_enable(true)
push!(cpu,t)
end
df = DataFrame()
df[:N] = N
df[:CPU] = cpu
path = Pkg.dir("MKL") * "/benchmark/"
p = Gadfly.plot(
layer(df,x="N",y="CPU",Geom.line),
Scale.x_log10,
Guide.xlabel("n-element vector"),
Guide.ylabel("CPU time in nsec/element"),
Guide.title("CPU time for sqrt(X) where X = Float64[] with
n elements"))
draw(PNG(path*"sqrt_cpu(n).png", 20cm, 20cm), p)
p