Regarding #2.

Looking at @code_llvm Test(A,B,C) we can see that a vector block is being 
generated and @code_native shows packed/vector instructions so apparently 
that the compiler somehow manages to get some vectorization going...

On Monday, February 29, 2016 at 3:30:14 PM UTC+1, Eduardo Lenz wrote:
>
> I have two silly questions about the example code below.
>
> 1) There is a small, but noticeable speed up in this code if I define the 
> alias
>     colptr = A.colptr
>     rowval = A.rowval
>     nzval  = A.nzval
>
> 2) Also, I would not expect a speed up from the @simd macro since the loop
>
>  @simd for k=colptr[col]:(colptr[col+1]-1)
>             @inbounds soma += B[rowval[k]] * nzval[k]
>   end
>
> does not have a unitary stride, but the diference is also noticiable.
>
> As there are no mentions to alias in the Performance Tips I would like to 
> known if there
> is a logical explanation for this. 
>
>
> # Test Code ( Y = A*B, A is sparse) 
> function Test( A::SparseMatrixCSC{Float64,Int64}, B::Array{Float64}, 
> Y::Array{Float64})
>
>     # Lets assume it is square
>     n = size(A,1)
>
>     # Local alias 
>     colptr = A.colptr
>     rowval = A.rowval
>     nzval  = A.nzval
>
>     #Loops
>     @inbounds for col = 1:n
>         const s = 0.0
>         @simd for k=colptr[col]:(colptr[col+1]-1)
>             @inbounds s += B[rowval[k]] * nzval[k]
>         end
>         Y[col] = s
>     end
>
> end
>

Reply via email to