Kristoffer

Automatic SIMD vectorization is easily thrown off guard by seemingly
small changes in the code. In your case, it's probably the additional
complexity introduced by tuples that makes it not vectorize.

Note that usual vectorization works by combining several loop
iterations into a single vector operation. Your code seems to be
written in the opposite way, where you expect each tuple to be
transformed into a single vector operation. This is in general more
difficult, since the compiler can't a priori be sure that the
operations performed on each tuple are identical. If that is indeed
the reason why you introduced tuples, then they shouldn't be
necessary; a simple array of scalars or a 2D-array would probably be
easier to process.

With the SIMD package (sorry! I see that you find it low-level), your
code would read

function tuple_simd_sum{T}(x::Vector{Vec{4, T}})
    s = Vec{4, T}(0)
    @inbounds for i in eachindex(x)
        s += x[i]
    end
    return s
end

tuple_vec = [Vec{4, Float64}((rand(), rand(), rand(), rand())) for i = 1:20]

@code_native tuple_simd_sum(tuple_vec)

and this code is nicely vectorized:

L32:
    vaddpd (%rcx), %ymm0, %ymm0
Source line: 62
    addq $-1, %rax
    addq $32, %rcx
    cmpq $0, %rax
    jne L32

Alternatively, you can also keep the array as array of tuples,
creating Vecs from x[i] on the fly, and converting s back to a tuple
when you return. A Vec is essentially a tuple that supports arithmetic
operations.


On Tue, Feb 2, 2016 at 8:12 AM, Kristoffer Carlsson
<[email protected]> wrote:
> For a simple reduction of an array I have code that vectorizes nicely:
>
> function simd_sum{T}(x::Vector{T})
>     s = zero(T)
>     @simd for i in eachindex(x)
>         @inbounds s = s + x[i]
>     end
>     return s
> end
>
>
> By looking at
>
> @code_llvm simd_sum(rand(Float64, 10))
>
>
> it can be seen that the loop is vectorized to use SIMD.
>
> However, for a similar loop using tuples:
>
> function tuple_simd_sum{T}(x::Vector{NTuple{4, T}})
>     s = (0.0, 0.0, 0.0, 0.0)
>     @inbounds @simd for i in eachindex(x)
>         x_i = x[i]
>         s = (s[1] + x_i[1], s[2] + x_i[2], s[3] + x_i[3], s[4] + x_i[4])
>     end
>     return s
> end
>
> tuple_vec = [(rand(), rand(), rand(), rand()) for i = 1:20]
>
> @code_llvm tuple_simd_sum(tuple_vec)
>
>
> The loop fails to use and vector instructions.
>
> Does anyone have any more info regarding vectorization of operations
> including tuples and if it is possible to somehow write code that vectorizes
> with tuples.
>
> Thanks!
>
> // Kristoffer
>
> PS: I've seen https://github.com/eschnett/SIMD.jl but something a bit higher
> level would be nice.
>
>
>



-- 
Erik Schnetter <[email protected]>
http://www.perimeterinstitute.ca/personal/eschnetter/

Reply via email to