Thanks. I'll try to use loops too. While still exploring my vectorized code I found out that matrix-vector multiplication is a lot faster when going column-wise, e.g.
X = rand(8000,8000); a = rand(8000); julia> @time r = a' * X; elapsed time: 0.131689512 seconds (128224 bytes allocated) going row-wise (slow) julia> @time r = X' * a; elapsed time: 0.057772737 seconds (64168 bytes allocated) going column-wise (fast) I therefore rewrote R[l,l+1:p]=(view(X,:,l)'*view(X,:,l+1:nvars))/sumabs2(view(X,:,l)) into R[l,l+1:p]=(view(X,:,l+1:nvars)'*view(X,:,l))/sumabs2(view(X,:,l)) and I got a little increase in performance julia> @time Q,R = mgs(X); elapsed time: 6.815198093 seconds (8017005584 bytes allocated, 39.99% gc time) Original qr function can do the job a lot more faster. I'd like to approach this speed at the end. julia> @time Q,R = qr(X); elapsed time: 0.542084044 seconds (32580000 bytes allocated, 5.41% gc time) I wonder why my vectorized code is not fast. I assume operations like "X' * a" are using BLAS (e.g. BLAS.gemv()) and thus it should be as fast as a devectorized code at least in this case. Thanks, Jan Dňa utorok, 21. októbra 2014 17:16:10 UTC+2 Steven G. Johnson napísal(-a): > > Just write out the three nested loops of MGS, rather than using array > operations. It will be a lot faster (in Julia). >
