I continue investigating matrix multiplication performance. Today I found that multiplication by array of zeros(..) is several times faster than multiplication by array of ones(..) or random numbers:
julia> A = rand(200, 100) ... julia> @time for i=1:1000 A * rand(100, 200) end elapsed time: 3.009730414 seconds (480160000 bytes allocated, 11.21% gc time) julia> @time for i=1:1000 A * ones(100, 200) end elapsed time: 2.973320655 seconds (480128000 bytes allocated, 12.72% gc time) julia> @time for i=1:1000 A * zeros(100, 200) end elapsed time: 0.438900132 seconds (480128000 bytes allocated, 85.46% gc time) So, A * zeros() is about 6 faster than other kinds of multiplication. Note also that it uses ~7x more GC time. On NumPy no such difference is seen: In [106]: %timeit dot(A, rand(100, 200)) 100 loops, best of 3: 2.77 ms per loop In [107]: %timeit dot(A, ones((100, 200))) 100 loops, best of 3: 2.59 ms per loop In [108]: %timeit dot(A, zeros((100, 200))) 100 loops, best of 3: 2.57 ms per loop So I'm curious, how multiplying by zeros matrix is different from other multiplication types?
