Thanks. I'll try to use loops too. 

While still exploring my vectorized code I found out that matrix-vector 
multiplication is a lot faster when going column-wise, e.g.

X = rand(8000,8000);
a = rand(8000);
julia> @time r = a' * X;
elapsed time: 0.131689512 seconds (128224 bytes allocated)
going row-wise (slow)

julia> @time r = X' * a;
elapsed time: 0.057772737 seconds (64168 bytes allocated)
going column-wise (fast)

I therefore rewrote 
R[l,l+1:p]=(view(X,:,l)'*view(X,:,l+1:nvars))/sumabs2(view(X,:,l))
into
R[l,l+1:p]=(view(X,:,l+1:nvars)'*view(X,:,l))/sumabs2(view(X,:,l))

and I got a little increase in performance
julia> @time Q,R = mgs(X);
elapsed time: 6.815198093 seconds (8017005584 bytes allocated, 39.99% gc 
time)

Original qr function can do the job a lot more faster. I'd like to approach 
this speed at the end.
julia> @time Q,R = qr(X);
elapsed time: 0.542084044 seconds (32580000 bytes allocated, 5.41% gc time)

I wonder why my vectorized code is not fast. I assume operations like "X' * 
a" are using BLAS (e.g. BLAS.gemv()) and thus it should be as fast as a 
devectorized code at least in this case.

Thanks,
Jan




Dňa utorok, 21. októbra 2014 17:16:10 UTC+2 Steven G. Johnson napísal(-a):
>
> Just write out the three nested loops of MGS, rather than using array 
> operations.  It will be a lot faster (in Julia).
>

Reply via email to