Thanks for your reply, Erik. I am playing around with getting SIMD instructions to work with the GradientNumbers in ForwardDiff.jl: (see: https://github.com/JuliaDiff/ForwardDiff.jl/issues/98). In ForwardDiff, GradientNumbers are represented as a type with a Number field and a NTuple of Numbers field.
The same operation are often applied to all the numbers in the tuple field, see https://github.com/JuliaDiff/ForwardDiff.jl/blob/ee40717f0f0941e369011907ee06e92e89ebae59/src/Partials.jl#L190 and below. The generated code then for a reduction function called with a vector of GradientNumbers is then very similar to the tuple sum I posted, which is why I asked if that could be vectorized. I will play around a bit with SIMD.jl and see how it works. Thanks. On Tuesday, February 2, 2016 at 7:49:27 PM UTC+1, Erik Schnetter wrote: > > Kristoffer > > Automatic SIMD vectorization is easily thrown off guard by seemingly > small changes in the code. In your case, it's probably the additional > complexity introduced by tuples that makes it not vectorize. > > Note that usual vectorization works by combining several loop > iterations into a single vector operation. Your code seems to be > written in the opposite way, where you expect each tuple to be > transformed into a single vector operation. This is in general more > difficult, since the compiler can't a priori be sure that the > operations performed on each tuple are identical. If that is indeed > the reason why you introduced tuples, then they shouldn't be > necessary; a simple array of scalars or a 2D-array would probably be > easier to process. > > With the SIMD package (sorry! I see that you find it low-level), your > code would read > > function tuple_simd_sum{T}(x::Vector{Vec{4, T}}) > s = Vec{4, T}(0) > @inbounds for i in eachindex(x) > s += x[i] > end > return s > end > > tuple_vec = [Vec{4, Float64}((rand(), rand(), rand(), rand())) for i = > 1:20] > > @code_native tuple_simd_sum(tuple_vec) > > and this code is nicely vectorized: > > L32: > vaddpd (%rcx), %ymm0, %ymm0 > Source line: 62 > addq $-1, %rax > addq $32, %rcx > cmpq $0, %rax > jne L32 > > Alternatively, you can also keep the array as array of tuples, > creating Vecs from x[i] on the fly, and converting s back to a tuple > when you return. A Vec is essentially a tuple that supports arithmetic > operations. > > > On Tue, Feb 2, 2016 at 8:12 AM, Kristoffer Carlsson > <[email protected] <javascript:>> wrote: > > For a simple reduction of an array I have code that vectorizes nicely: > > > > function simd_sum{T}(x::Vector{T}) > > s = zero(T) > > @simd for i in eachindex(x) > > @inbounds s = s + x[i] > > end > > return s > > end > > > > > > By looking at > > > > @code_llvm simd_sum(rand(Float64, 10)) > > > > > > it can be seen that the loop is vectorized to use SIMD. > > > > However, for a similar loop using tuples: > > > > function tuple_simd_sum{T}(x::Vector{NTuple{4, T}}) > > s = (0.0, 0.0, 0.0, 0.0) > > @inbounds @simd for i in eachindex(x) > > x_i = x[i] > > s = (s[1] + x_i[1], s[2] + x_i[2], s[3] + x_i[3], s[4] + x_i[4]) > > end > > return s > > end > > > > tuple_vec = [(rand(), rand(), rand(), rand()) for i = 1:20] > > > > @code_llvm tuple_simd_sum(tuple_vec) > > > > > > The loop fails to use and vector instructions. > > > > Does anyone have any more info regarding vectorization of operations > > including tuples and if it is possible to somehow write code that > vectorizes > > with tuples. > > > > Thanks! > > > > // Kristoffer > > > > PS: I've seen https://github.com/eschnett/SIMD.jl but something a bit > higher > > level would be nice. > > > > > > > > > > -- > Erik Schnetter <[email protected] <javascript:>> > http://www.perimeterinstitute.ca/personal/eschnetter/ >
