>
> You can squeeze a tiny bit extra out of the implementation by using the 
> inplace gemv!, i.e.
> function mgs2(X)
> # mgs speed project
>    nobs, nvars = size(X)
>    R = eye(nvars)
>    for l=1:nvars-1
>      v_i = view(X,:,l+1:nvars)
>      s = view(X,:,l);
> r = vec(view(R,l,l+1:nvars))
>
>      BLAS.gemv!('T', 1.0/sumabs2(s), v_i, s, 0.0, r) # --- line 13 
>      BLAS.ger!(-1.0, s, r, v_i) # --- line 14
>    end
>   X, R
>  end
>
> Med venlig hilsen
>
> Andreas Noack
>

Indeed, a bit of speed and memory allocation improved too. I originally 
rejected the use of gemv! because I thought that additional multiplication 
of beta*y is unnecessary overhead but this seems not to be the case. I 
assume if beta = 0.0 then multiplication is not performed at all. 

This implementation is actually approaching the speed of original qr() 
function. On my other computer I get 0.17s with qr() and 0.22s with mgs() 
for X = rand(1000,1000). Julia is installed with OpenBLAS.

Thanks a lot, in was an excellent exercise to get more acquainted with 
Julia. 
Jan

p.s. I also learned that the above implementation actually overwrites 
original X matrix inplace so I renamed it to mgs!(X). I also did 
non-inplace version that would be:

using ArrayViews

function mgs(X)
# mgs speed project

  nobs, nvars = size(X)

  R = eye(nvars)
  Q = copy(X);

  for l=1:nvars-1
    v_i = view(Q,:,l+1:nvars)
    s = view(Q,:,l);
    r = vec(view(R,l,l+1:nvars))

    BLAS.gemv!('T', 1.0/sumabs2(s), v_i, s, 0.0, r)
    BLAS.ger!(-1.0, s, r, v_i)
  end

  Q, R

end

Speed is pretty much the same. I assume copy() and deepcopy() works the 
same for arrays.

Reply via email to