>
> You can squeeze a tiny bit extra out of the implementation by using the
> inplace gemv!, i.e.
> function mgs2(X)
> # mgs speed project
> nobs, nvars = size(X)
> R = eye(nvars)
> for l=1:nvars-1
> v_i = view(X,:,l+1:nvars)
> s = view(X,:,l);
> r = vec(view(R,l,l+1:nvars))
>
> BLAS.gemv!('T', 1.0/sumabs2(s), v_i, s, 0.0, r) # --- line 13
> BLAS.ger!(-1.0, s, r, v_i) # --- line 14
> end
> X, R
> end
>
> Med venlig hilsen
>
> Andreas Noack
>
Indeed, a bit of speed and memory allocation improved too. I originally
rejected the use of gemv! because I thought that additional multiplication
of beta*y is unnecessary overhead but this seems not to be the case. I
assume if beta = 0.0 then multiplication is not performed at all.
This implementation is actually approaching the speed of original qr()
function. On my other computer I get 0.17s with qr() and 0.22s with mgs()
for X = rand(1000,1000). Julia is installed with OpenBLAS.
Thanks a lot, in was an excellent exercise to get more acquainted with
Julia.
Jan
p.s. I also learned that the above implementation actually overwrites
original X matrix inplace so I renamed it to mgs!(X). I also did
non-inplace version that would be:
using ArrayViews
function mgs(X)
# mgs speed project
nobs, nvars = size(X)
R = eye(nvars)
Q = copy(X);
for l=1:nvars-1
v_i = view(Q,:,l+1:nvars)
s = view(Q,:,l);
r = vec(view(R,l,l+1:nvars))
BLAS.gemv!('T', 1.0/sumabs2(s), v_i, s, 0.0, r)
BLAS.ger!(-1.0, s, r, v_i)
end
Q, R
end
Speed is pretty much the same. I assume copy() and deepcopy() works the
same for arrays.