I opened the Julia interpreter slightly different this time:
OPENBLAS_NUM_THREADS=1 julia
This way I can reproduce your latest results:
julia> A=rand(5000,5000);
julia> B=rand(5000,5000);
julia> for i=1:10
@time A*B;
end
elapsed time: 11.424265373 seconds (226166468 bytes allocated)
elapsed time: 11.010769165 seconds (200000112 bytes allocated)
elapsed time: 11.018424186 seconds (200000112 bytes allocated)
elapsed time: 11.175638085 seconds (200000112 bytes allocated)
elapsed time: 11.023500157 seconds (200000112 bytes allocated)
elapsed time: 10.995496842 seconds (200000112 bytes allocated)
elapsed time: 11.000644216 seconds (200000112 bytes allocated)
elapsed time: 10.916478885 seconds (200000112 bytes allocated)
elapsed time: 10.990804315 seconds (200000112 bytes allocated)
elapsed time: 11.174570585 seconds (200000112 bytes allocated)
If I however set OPENBLAS_NUM_THREADS > 1 than I get those weird results. I
have quad core CPU.