First look at Julia, I read somewhere that it is advised to de-vectorize
code so I just tried this:
function matmul(a,b)
c=zeros(typeof(a[1,1]),(size(a,1),size(b,2)))
for j = 1:size(b,2)
for i =1:size(a,1)
for k = 1:size(b,1)
c[i,j]+=a[i,k]*b[k,j]
end
end
end
c
end
function matmul2(a,b)
a*b
end
a=rand(2,3);
b=rand(3,4);
c=matmul(a,b); #just to make the JIT
c1=matmul2(a,b); #compile the functions ahed of @time
a=rand(6000,500);
b=rand(500,8000);
@time(matmul(a,b);)
@time(matmul2(a,b);)
and I got that:
elapsed time: 150.661463517 seconds (384000192 bytes allocated)
elapsed time: 0.990317124 seconds (384000192 bytes allocated)
the code for matrix multiplication I assume is some kind of BLAS maybe in
fortran (or assembler?) maybe optimized for SSE2, for sure using all my 4 cores
so this is not the typical example where de-vectorizing is advisable...
nonetheless, isn't it a factor of 150 a bit higher than expected? I missed
something important in the matmul code?