I have a somewhat large matrix-vector product to calculate, where the 
matrix is not full rank. I can perform a low rank factorization of the nxn 
Matrix M as M=LL' where L is nxp.  I'd like to use this information to 
speed up the computation. Here is a minimal example:

x=rand(10000,100)                                                           
                                                                            
                                                           
v=rand(10000)                                                               
                                                                            
                                                           
xx=x*x'                                                                     
                                                                            
                                                           
xt=x'                                                                       
                                                                            
                                                           
@time xx*v;                                                                 
                                                                            
                                                           
@time x*x'*v;                                                               
                                                                            
                                                           
@time x*(x'*v);                                                             
                                                                            
                                                           
@time x*(xt*v);

here are the timings:

julia> @time xx*v;
  0.038634 seconds (8 allocations: 78.375 KB)
julia> @time x*x'*v;
  0.385250 seconds (17 allocations: 770.646 MB, 13.14% gc time)
julia> @time x*(x'*v);
  0.000662 seconds (10 allocations: 79.266 KB)
julia> @time x*(xt*v);
  0.000819 seconds (10 allocations: 79.266 KB)

I would like to understand why x*(x'*v) is much faster than  x*x'*v, and 
uses less memory. I would have thought the order of operations would have 
gone from right to left but it seems the parentheses made a huge 
difference. I also might have expected that x' would just flick a TRANS='T" 
switch in the BLAS implementation but by the looks of it there is a very 
large amount of memory allocation going on in x*x'*v.

Reply via email to