I have a somewhat large matrix-vector product to calculate, where the matrix is not full rank. I can perform a low rank factorization of the nxn Matrix M as M=LL' where L is nxp. I'd like to use this information to speed up the computation. Here is a minimal example:
x=rand(10000,100)
v=rand(10000)
xx=x*x'
xt=x'
@time xx*v;
@time x*x'*v;
@time x*(x'*v);
@time x*(xt*v);
here are the timings:
julia> @time xx*v;
0.038634 seconds (8 allocations: 78.375 KB)
julia> @time x*x'*v;
0.385250 seconds (17 allocations: 770.646 MB, 13.14% gc time)
julia> @time x*(x'*v);
0.000662 seconds (10 allocations: 79.266 KB)
julia> @time x*(xt*v);
0.000819 seconds (10 allocations: 79.266 KB)
I would like to understand why x*(x'*v) is much faster than x*x'*v, and
uses less memory. I would have thought the order of operations would have
gone from right to left but it seems the parentheses made a huge
difference. I also might have expected that x' would just flick a TRANS='T"
switch in the BLAS implementation but by the looks of it there is a very
large amount of memory allocation going on in x*x'*v.
