Thank you, much appreciated! On Monday, March 23, 2020 at 2:25:13 AM UTC-5, Dmitry Babokin wrote: > > Current code is using only "uniform" data and operations, so it's > expectedly using only xmms. > > I've asked folks, who developed similar code, to have a look and comment > on the best strategies to express your code in ISPC. > > Dmitry. > > On Sun, Mar 22, 2020 at 2:42 PM David Nadaski <[email protected] > <javascript:>> wrote: > >> I've also made an implementation that's using aos_to_soa4 / soa_to_aos4 >> which does use YMM but it's even slower than the other two: >> https://ispc.godbolt.org/z/zNbbU2 >> >> This is what I'm ultimately trying to implement in ispc so that it can >> benefit from ispc's automatic AVXization etc: >> https://godbolt.org/z/DwaTuW >> >> On Sunday, March 22, 2020 at 4:56:37 AM UTC-5, David Nadaski wrote: >>> >>> Hi Dmitry, here it is: https://ispc.godbolt.org/z/8omkHp >>> >>> Basically I'm looking for the most optimized way of taking in an array >>> of Vector4 from c++ in AOS form and doing calculations on them in ispc. >>> If I use foreach, ispc complains about stores and loads and the >>> generated code is slower, but it has YMM. >>> If I use the code above (on godbolt, simple for), the code is faster >>> than the foreach version but lacks YMM, so I'm guessing it could be made >>> more performant. >>> Thank you for your help. >>> >>> David >>> >>> On Sunday, March 22, 2020 at 3:51:36 AM UTC-5, Dmitry Babokin wrote: >>>> >>>> David, >>>> >>>> Typically if you target avx2 (using --target=avx2-i32x8), the code >>>> should use ymm registers. If you use --target=avx2-i32x4, then your data >>>> is >>>> 128 bit wide (for int and float vectors), which means that xmm registers >>>> will be used. >>>> >>>> Another possibility that you are using uniform float<4>, which again >>>> means that you are operating on 128 bit vectors. >>>> >>>> If you are already using avx2-i32x8 target, can you share the code via >>>> Compiler Explorer link, so I can see both the code and compilation flags? >>>> >>>> Dmitry. >>>> >>>> On Sat, Mar 21, 2020 at 4:17 PM David Nadaski <[email protected]> >>>> wrote: >>>> >>>>> Thank you Oleh! >>>>> >>>>> Upon checking the generated assembly, I'm noticing that it's executing >>>>> against AVX2 but not using YMM registers at all. Would you know why? >>>>> I've compiled it using -O2. >>>>> >>>>> *AddVec4sProper_avx2:* >>>>> 00007FF6F9E167E0 test r8d,r8d >>>>> 00007FF6F9E167E3 jle AddVec4sProper_avx2+0B9h >>>>> (07FF6F9E16899h) >>>>> 00007FF6F9E167E9 lea eax,[r8-1] >>>>> 00007FF6F9E167ED mov r9d,r8d >>>>> 00007FF6F9E167F0 and r9d,3 >>>>> 00007FF6F9E167F4 cmp eax,3 >>>>> 00007FF6F9E167F7 jae AddVec4sProper_avx2+26h >>>>> (07FF6F9E16806h) >>>>> 00007FF6F9E167F9 xor r10d,r10d >>>>> 00007FF6F9E167FC test r9d,r9d >>>>> 00007FF6F9E167FF jne AddVec4sProper_avx2+90h >>>>> (07FF6F9E16870h) >>>>> 00007FF6F9E16801 jmp AddVec4sProper_avx2+0B9h >>>>> (07FF6F9E16899h) >>>>> 00007FF6F9E16806 sub r8d,r9d >>>>> 00007FF6F9E16809 mov eax,30h >>>>> 00007FF6F9E1680E xor r10d,r10d >>>>> 00007FF6F9E16811 vmovss xmm0,dword ptr [__real@3f800000 >>>>> (07FF6FA0EAF20h)] >>>>> 00007FF6F9E16819 nop dword ptr [rax] >>>>> 00007FF6F9E16820 vmovaps xmm1,xmmword ptr [rdx+rax-30h] >>>>> 00007FF6F9E16826 vaddss xmm1,xmm1,xmm0 >>>>> 00007FF6F9E1682A vmovaps xmmword ptr [rcx+rax-30h],xmm1 >>>>> 00007FF6F9E16830 vmovaps xmm1,xmmword ptr [rdx+rax-20h] >>>>> 00007FF6F9E16836 vaddss xmm1,xmm1,xmm0 >>>>> 00007FF6F9E1683A vmovaps xmmword ptr [rcx+rax-20h],xmm1 >>>>> 00007FF6F9E16840 vmovaps xmm1,xmmword ptr [rdx+rax-10h] >>>>> 00007FF6F9E16846 vaddss xmm1,xmm1,xmm0 >>>>> 00007FF6F9E1684A vmovaps xmmword ptr [rcx+rax-10h],xmm1 >>>>> 00007FF6F9E16850 vmovaps xmm1,xmmword ptr [rdx+rax] >>>>> 00007FF6F9E16855 vaddss xmm1,xmm1,xmm0 >>>>> 00007FF6F9E16859 vmovaps xmmword ptr [rcx+rax],xmm1 >>>>> 00007FF6F9E1685E add r10,4 >>>>> 00007FF6F9E16862 add rax,40h >>>>> 00007FF6F9E16866 cmp r8d,r10d >>>>> 00007FF6F9E16869 jne AddVec4sProper_avx2+40h >>>>> (07FF6F9E16820h) >>>>> 00007FF6F9E1686B test r9d,r9d >>>>> 00007FF6F9E1686E je AddVec4sProper_avx2+0B9h >>>>> (07FF6F9E16899h) >>>>> 00007FF6F9E16870 shl r10,4 >>>>> 00007FF6F9E16874 neg r9d >>>>> 00007FF6F9E16877 vmovss xmm0,dword ptr [__real@3f800000 >>>>> (07FF6FA0EAF20h)] >>>>> 00007FF6F9E1687F nop >>>>> 00007FF6F9E16880 vmovaps xmm1,xmmword ptr [rdx+r10] >>>>> 00007FF6F9E16886 vaddss xmm1,xmm1,xmm0 >>>>> 00007FF6F9E1688A vmovaps xmmword ptr [rcx+r10],xmm1 >>>>> 00007FF6F9E16890 add r10,10h >>>>> 00007FF6F9E16894 inc r9d >>>>> 00007FF6F9E16897 jne AddVec4sProper_avx2+0A0h >>>>> (07FF6F9E16880h) >>>>> 00007FF6F9E16899 ret >>>>> 00007FF6F9E1689A nop word ptr [rax+rax] >>>>> >>>>> On Saturday, March 21, 2020 at 12:21:43 PM UTC-5, Oleh Nechaev wrote: >>>>>> >>>>>> struct Vector4SOA >>>>>> { >>>>>> float<4> V; >>>>>> }; >>>>>> >>>>>> export void Test(uniform Vector4SOA outs[], uniform >>>>>> Vector4SOA ins[], uniform int count) >>>>>> { >>>>>> for (uniform int i=0; i< count ; ++i) >>>>>> { >>>>>> uniform Vector4SOA vv = ins[i]; >>>>>> vv.V.x++; // builtin access by x y z w and r g b a >>>>>> outs[i] = vv; >>>>>> } >>>>>> } >>>>>> >>>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "Intel SPMD Program Compiler Users" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/ispc-users/cf0af12b-d84a-463c-be09-c8b35b3baf81%40googlegroups.com >>>>> >>>>> <https://groups.google.com/d/msgid/ispc-users/cf0af12b-d84a-463c-be09-c8b35b3baf81%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> -- >> You received this message because you are subscribed to the Google Groups >> "Intel SPMD Program Compiler Users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/ispc-users/405aca52-e245-4c3a-8a0b-27bb35a3c0ee%40googlegroups.com >> >> <https://groups.google.com/d/msgid/ispc-users/405aca52-e245-4c3a-8a0b-27bb35a3c0ee%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> >
-- You received this message because you are subscribed to the Google Groups "Intel SPMD Program Compiler Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/ispc-users/41351dc0-539d-4685-a8c6-e66dde273fd2%40googlegroups.com.
