Thanks for your reply, Erik.

I am playing around with getting SIMD instructions to work with the 
GradientNumbers in ForwardDiff.jl: 
(see: https://github.com/JuliaDiff/ForwardDiff.jl/issues/98). 
In ForwardDiff, GradientNumbers are represented as a type with a Number 
field and a NTuple of Numbers field.

The same operation are often applied to all the numbers in the tuple field, 
see 
https://github.com/JuliaDiff/ForwardDiff.jl/blob/ee40717f0f0941e369011907ee06e92e89ebae59/src/Partials.jl#L190
 
and below.

The generated code then for a reduction function called with a vector of 
GradientNumbers is then very similar to the tuple sum I posted, which is 
why I asked if that could be vectorized.

I will play around a bit with SIMD.jl and see how it works.

Thanks.



On Tuesday, February 2, 2016 at 7:49:27 PM UTC+1, Erik Schnetter wrote:
>
> Kristoffer 
>
> Automatic SIMD vectorization is easily thrown off guard by seemingly 
> small changes in the code. In your case, it's probably the additional 
> complexity introduced by tuples that makes it not vectorize. 
>
> Note that usual vectorization works by combining several loop 
> iterations into a single vector operation. Your code seems to be 
> written in the opposite way, where you expect each tuple to be 
> transformed into a single vector operation. This is in general more 
> difficult, since the compiler can't a priori be sure that the 
> operations performed on each tuple are identical. If that is indeed 
> the reason why you introduced tuples, then they shouldn't be 
> necessary; a simple array of scalars or a 2D-array would probably be 
> easier to process. 
>
> With the SIMD package (sorry! I see that you find it low-level), your 
> code would read 
>
> function tuple_simd_sum{T}(x::Vector{Vec{4, T}}) 
>     s = Vec{4, T}(0) 
>     @inbounds for i in eachindex(x) 
>         s += x[i] 
>     end 
>     return s 
> end 
>
> tuple_vec = [Vec{4, Float64}((rand(), rand(), rand(), rand())) for i = 
> 1:20] 
>
> @code_native tuple_simd_sum(tuple_vec) 
>
> and this code is nicely vectorized: 
>
> L32: 
>     vaddpd (%rcx), %ymm0, %ymm0 
> Source line: 62 
>     addq $-1, %rax 
>     addq $32, %rcx 
>     cmpq $0, %rax 
>     jne L32 
>
> Alternatively, you can also keep the array as array of tuples, 
> creating Vecs from x[i] on the fly, and converting s back to a tuple 
> when you return. A Vec is essentially a tuple that supports arithmetic 
> operations. 
>
>
> On Tue, Feb 2, 2016 at 8:12 AM, Kristoffer Carlsson 
> <[email protected] <javascript:>> wrote: 
> > For a simple reduction of an array I have code that vectorizes nicely: 
> > 
> > function simd_sum{T}(x::Vector{T}) 
> >     s = zero(T) 
> >     @simd for i in eachindex(x) 
> >         @inbounds s = s + x[i] 
> >     end 
> >     return s 
> > end 
> > 
> > 
> > By looking at 
> > 
> > @code_llvm simd_sum(rand(Float64, 10)) 
> > 
> > 
> > it can be seen that the loop is vectorized to use SIMD. 
> > 
> > However, for a similar loop using tuples: 
> > 
> > function tuple_simd_sum{T}(x::Vector{NTuple{4, T}}) 
> >     s = (0.0, 0.0, 0.0, 0.0) 
> >     @inbounds @simd for i in eachindex(x) 
> >         x_i = x[i] 
> >         s = (s[1] + x_i[1], s[2] + x_i[2], s[3] + x_i[3], s[4] + x_i[4]) 
> >     end 
> >     return s 
> > end 
> > 
> > tuple_vec = [(rand(), rand(), rand(), rand()) for i = 1:20] 
> > 
> > @code_llvm tuple_simd_sum(tuple_vec) 
> > 
> > 
> > The loop fails to use and vector instructions. 
> > 
> > Does anyone have any more info regarding vectorization of operations 
> > including tuples and if it is possible to somehow write code that 
> vectorizes 
> > with tuples. 
> > 
> > Thanks! 
> > 
> > // Kristoffer 
> > 
> > PS: I've seen https://github.com/eschnett/SIMD.jl but something a bit 
> higher 
> > level would be nice. 
> > 
> > 
> > 
>
>
>
> -- 
> Erik Schnetter <[email protected] <javascript:>> 
> http://www.perimeterinstitute.ca/personal/eschnetter/ 
>

Reply via email to