Re: Coalescing double loads?

Dmitry Babokin Tue, 23 May 2017 16:02:36 -0700

<I've replied to this email in ispc-dev mailing list, while ispc-users is
better fit for it. I'm copy-pasting my reply here>


Philip,

Note that i is varying int, not uniform. I.e. on first iteration has value
(0,1,2,3), if you are compiling for avx1-i32x4 (or other 4-wide target).

Hence, 4*i has value (0,4,8,12). Which means array[4*i] is not a continuous
load, it has to be gather.

So I assume your data layout is x, y, z, w, x, y, z, w,... And in this case
you would need to use aos_to_soa4() function, but it supports only float
and int32. The code would look like this:

export void normalizeSOA(uniform float array[], uniform int count,
                         uniform float zeros[]) {
   float l2 = 0;
   for (uniform int i = 0; i < count; i += programCount*4) {
     float x, y, z, w;
     aos_to_soa4(&array[i], &x, &y, &z, &w);

     l2 += x*x + y*y + z*z + w*w;
   }

   zeros[0] = reduce_add(l2);
}

The better solution would be to have x, x, x, x, y, y, y, y, z, z, z, z, w,
w, w, w, x, x, x,... data layout, in this case you'll be able to work with
it much more efficiently. And the code would look like:

export void normalizeSOA(uniform double array[], uniform int count,
                         uniform double zeros[]) {
   double l2 = 0;
   for (uniform int i = 0; i < count; i += programCount*4) {
     double x = array[i*programCount + programIndex];
     double y = array[(i+1)*programCount + programIndex];
     double z = array[(i+2)*programCount + programIndex];
     double w = array[(i+3)*programCount + programIndex];

     l2 += x*x + y*y + z*z + w*w;
   }

   zeros[0] = reduce_add(l2);
}

Hope it helps.

Dmitry.

On Tue, May 23, 2017 at 9:08 AM, Philip Thong <[email protected]>
wrote:

> Hi,
>
> I'm trying to compile a simple ISPC program:
>
> export void normalizeSOA(uniform double array[], uniform int count,
>                          uniform double zeros[]) {
>    double l2 = 0;
>    foreach (i = 0 ... count/4) {
>       double x = array[4*i];
>       double y = array[4*i+1];
>       double z = array[4*i+2];
>       double w = array[4*i+3];
>
>      l2 += x*x + y*y + z*z + w*w;
>    }
>
>    zeros[0] = reduce_add(l2);
> }
>
> I always get Performance warnings about the gather if the datatype is
> double.
>
> I would have expected the loads to be coalesced.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Intel SPMD Program Compiler Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Intel SPMD Program Compiler Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: Coalescing double loads?

Reply via email to