You can use blas_set_num_threads(1) in julia and it will use only 1
thread... but this is not right answer
Dňa pondelok, 21. marca 2016 9:49:23 UTC+1 Igor Cerovsky napísal(-a):
>
> Hi,
>
> Trying to write custom and using BLAS functions implementation of
> Gram-Schmidt algorithm I got more than 2-times slower performance for Julia
> 0.4.3 on my computer Intel i7 6700HQ (on older processor i7 5500 the
> performance gain is 1.2-times). The code below is a bit longer, but I got
> the slow performance in the whole context. Trying to profile parts of the
> algorithm I got only slightly different performance.
>
> Custom implementation:
>
> function rank1update!(DST, A, R, k)
> rows, cols = size(A)
> for j = k+1:cols
> @simd for i = 1:rows
> @inbounds DST[i,j] -= A[i, k] * R[k, j]
> end
> end
> end
>
> function mygemv!(DST, A, k, alpha)
> rows, cols = size(A)
> for j in k+1:cols
> s = 0.0
> @simd for i in 1:rows
> @inbounds s += A[i, k] * A[i, j]
> end
> DST[k, j] = s * alpha
> end
> end
>
> function mgsf(M)
> rows, cols = size(M)
> Q = copy(M)
> R = eye(cols)
>
> for k in 1:cols
> alpha = 1.0 / sumabs2(sub(Q, :, k))
> mygemv!(R, Q, k, alpha)
> rank1update!(Q, Q, R, k)
> end
> Q, R
> end
>
> Implementation using BLAS functions:
> function mgs_blas(M)
> cols = size(M, 2)
> Q = copy(M)
> R = eye(cols)
>
> for k in 1:cols
> q_k = sub(Q, :, k)
> Q_sub = sub(Q, :, k+1:cols)
> R_sub = sub(R, k, k+1:cols)
>
> alpha = 1.0 / sumabs2(q_k)
> R[k, k+1:cols] = BLAS.gemv('T', alpha, Q_sub, q_k)
> BLAS.ger!(-1.0, q_k, vec(R_sub), Q_sub)
> end
>
> Q, R
> end
>
> And results; using BLAS the performance gain is ~2.6 times:
>
> # custom implementation
> Q2, R2 = @time mgsf(T);
>
> 0.714916 seconds (4.99 k allocations: 15.411 MB, 0.08% gc time)
>
>
> # implementation using BLAS functions
>
> Q5, R5 = @time mgs_blas(T);
>
> 0.339278 seconds (16.45 k allocations: 23.521 MB, 0.76% gc time)
>
>
>
> A hint: Looking at performance graph in the Task Manager it seems BLAS
> uses more cores.
> The question that remains is: what is going on?
>
> Thanks for explanation.
>
>