Dňa streda, 22. októbra 2014 11:56:31 UTC+2 Tim Holy napísal(-a):
>
> On Wednesday, October 22, 2014 02:27:30 AM Ján Dolinský wrote: 
> > I wonder why my vectorized code is not fast. I assume operations like 
> "X' * 
> > a" are using BLAS (e.g. BLAS.gemv()) and thus it should be as fast as a 
> > devectorized code at least in this case. 
>
> Yes, but their result requires allocating a temporary, and temporaries 
> mean 
> garbage collection. Try 
> - pre-allocating a couple of temp vectors to store the results of your 
> matrix 
> multiplies, and use functions like `Ac_mul_B!(dest, A, B)` to calculate 
> A'*B 
> and store the result in dest. 
> - Do the subtractions & divisions using a loop, as suggested by Steven. 
> - in each iteration you should call `view(X, :, l)` just once and store 
> the 
> result to a variable---no point recreating the same view 3 times per 
> iteration. 
>
> --Tim 
>

Hello Tim,

Thanks for the tips. I rewrote a bit the mgs routine so that it is more 
readable and did profiling.

using ArrayViews

function mgs(X)
# mgs speed project

  nobs, nvars = size(X)
  R = eye(nvars)

  for l=1:nvars-1
    v_i = view(X,:,l+1:nvars) # --- line 10
    s = view(X,:,l);
    
    R[l,l+1:nvars]=(v_i'*s)/sumabs2(s) # --- line 13
    X[:,l+1:nvars]=v_i-(s*view(R,l,l+1:nvars)) # --- line 14
  end

  X, R

end

Profilling is as follows:
1     linalg/blas.jl; gemm!; line: 526
10455 task.jl; anonymous; line: 96
 10455 REPL.jl; eval_user_input; line: 54
  10455 profile.jl; anonymous; line: 14
   2    ...me/Projects/julia-trials/mgs.jl; mgs; line: 7
    2 array.jl; eye; line: 176
     2 array.jl; fill!; line: 155
   2383 ...me/Projects/julia-trials/mgs.jl; mgs; line: 10
   *41   ...me/Projects/julia-trials/mgs.jl; mgs; line: 13*
    3  array.jl; ./; line: 756
    7  array.jl; ./; line: 758
    1  linalg/matmul.jl; gemv!; line: 217
     1 pointer.jl; pointer; line: 29
    16 multidimensional.jl; setindex!; line: 76
    3  multidimensional.jl; setindex!; line: 295
    6  reduce.jl; _mapreduce; line: 168
   *8027 ...me/Projects/julia-trials/mgs.jl; mgs; line: 14*
    31   ...3/ArrayViews/src/ArrayViews.jl; strided_view; line: 72
    1    abstractarray.jl; copy!; line: 152
    3109 array.jl; -; line: 719
    2770 array.jl; -; line: 721
    5    linalg/matmul.jl; *; line: 74
     1 abstractarray.jl; copy!; line: 150
    1807 multidimensional.jl; setindex!; line: 76
    1    multidimensional.jl; setindex!; line: 80
    303  multidimensional.jl; setindex!; line: 295
   1    abstractarray.jl; stride; line: 30
 
There is apparently a huge difference between line 13 and 14. 
Line 14 indeed deserves a loop as suggested by Steven. The culprit seems to 
be the minus operation. I'll explore the problem further.
Line 13 seems to be already reasonably fast but preallocating a space for 
output of (v_i'*s) and using something like "Ac_mul_B!(dest, v_i, s)" might 
provide further speedup.


It is very interesting, I'll play with it a bit further and post the 
improvements.

Jan

Reply via email to