@Jutho: precomputing matrices doesn't make too much difference: julia> R = rand(100, 200 ... julia> Z = zeros(100, 200) ... julia> O = ones(100, 200) ...
julia> @time for i=1:1000 A * R end elapsed time: 2.845228639 seconds (320080000 bytes allocated, 8.31% gc time) julia> @time for i=1:1000 A * O end elapsed time: 2.811217686 seconds (320080000 bytes allocated, 8.33% gc time) julia> @time for i=1:1000 A * Z end elapsed time: 0.288209494 seconds (320080000 bytes allocated, 83.87% gc time) -------------------- @Thomas: no difference too: julia> @time for i=1:1000 A * R end elapsed time: 3.079787601 seconds (320080000 bytes allocated, 8.15% gc time) julia> gc() julia> @time for i=1:1000 A * R end elapsed time: 2.802706492 seconds (320080000 bytes allocated, 7.15% gc time) julia> gc() ... (5x more times with same results) julia> @time for i=1:1000 A * Z end elapsed time: 0.254381593 seconds (320080000 bytes allocated, 81.42% gc time) julia> gc() julia> @time for i=1:1000 A * Z end elapsed time: 0.252976879 seconds (320080000 bytes allocated, 81.44% gc time) julia> gc() !julia> @time for i=1:1000 A * R end elapsed time: 2.77465144 seconds (320080000 bytes allocated, 7.20% gc time) --------------------- There's, however, one interesting observation - the larger are matrices, the larger is different. For example, for A = rand(20, 10); R = rand(10, 20) and Z = zeros(10, 20) we have: julia> @time for i=1:1000000 A * R end elapsed time: 6.073379483 seconds (3296000000 bytes allocated, 40.88% gc time) !julia> @time for i=1:1000000 A * Z end elapsed time: 3.357091461 seconds (3296000000 bytes allocated, 74.58% gc time) Only 2 times faster. But for A = rand(2000, 1000); R = rand(1000, 2000) and Z = zeros() we have: julia> @time for i=1:10 A * R end elapsed time: 32.497353017 seconds (320001120 bytes allocated, 0.43% gc time) julia> @time for i=1:10 A * Z end elapsed time: 0.209265586 seconds (320001120 bytes allocated, 70.28% gc time) 155 times faster! Moreover, if I use binary matrix (half of elements is 1 and half is 0), time is 1/2 of time for random numbers: julia> B = zeros(size(R)) ... julia> B[R .> 0.5] = 1 1 julia> @time for i=1:10 A * B end elapsed time: 16.077884672 seconds (320001120 bytes allocated, 1.09% gc time) So time is proportional to a number of non-zero elements. For reference: I use Intel i7 processor with 2 cores, hyperthreading enabled (so virtually 4 cores), but only 1 core working.