I did blas_set_num_threads(1) with the same profile numbers.  This is using 
Apple’s BLAS.  

Maybe I’ll try 0.5 and OpenBLAS for comparison.

> On 10 Sep 2016, at 2:34 AM, Andreas Noack <andreasnoackjen...@gmail.com> 
> wrote:
> 
> Try to time it again with threading disabled. Sometimes the threading 
> heuristics can cause unintuitive performance.
> 
> On Friday, September 9, 2016 at 6:39:13 AM UTC-4, Sheehan Olver wrote:
> 
> I have the following code that is part of a Householder routine, where 
> j::Int64, N::Int64, R.cols::Vector{Int64}, wp::Ptr{Float64}, M::Int64, 
> v::Ptr{Float64}:
> 
>   …
>         for j=k:N
>             v=r+(R.cols[j]+k-2)*sz
>             dt=BLAS.dot(M,wp,1,v,1)
>             BLAS.axpy!(M,-2*dt,wp,1,v,1)
>         end
>     …
> 
> 
> 
> For some reason, the BLAS.dot call takes 3x as long as the BLAS.axpy! call.  
> Is this expected, or is there something wrong?
> 
> 

Reply via email to