Regarding #2.
Looking at @code_llvm Test(A,B,C) we can see that a vector block is being
generated and @code_native shows packed/vector instructions so apparently
that the compiler somehow manages to get some vectorization going...
On Monday, February 29, 2016 at 3:30:14 PM UTC+1, Eduardo Lenz wrote:
>
> I have two silly questions about the example code below.
>
> 1) There is a small, but noticeable speed up in this code if I define the
> alias
> colptr = A.colptr
> rowval = A.rowval
> nzval = A.nzval
>
> 2) Also, I would not expect a speed up from the @simd macro since the loop
>
> @simd for k=colptr[col]:(colptr[col+1]-1)
> @inbounds soma += B[rowval[k]] * nzval[k]
> end
>
> does not have a unitary stride, but the diference is also noticiable.
>
> As there are no mentions to alias in the Performance Tips I would like to
> known if there
> is a logical explanation for this.
>
>
> # Test Code ( Y = A*B, A is sparse)
> function Test( A::SparseMatrixCSC{Float64,Int64}, B::Array{Float64},
> Y::Array{Float64})
>
> # Lets assume it is square
> n = size(A,1)
>
> # Local alias
> colptr = A.colptr
> rowval = A.rowval
> nzval = A.nzval
>
> #Loops
> @inbounds for col = 1:n
> const s = 0.0
> @simd for k=colptr[col]:(colptr[col+1]-1)
> @inbounds s += B[rowval[k]] * nzval[k]
> end
> Y[col] = s
> end
>
> end
>