On Mon, Dec 5, 2016 at 5:09 PM, Connor Abbott <cwabbo...@gmail.com> wrote: > On Mon, Dec 5, 2016 at 3:22 PM, Matt Turner <matts...@gmail.com> wrote: >> On 12/05, Matt Turner wrote: >>> >>> On 11/28, Ian Romanick wrote: >>>> >>>> From: Ian Romanick <ian.d.roman...@intel.com> >>>> Patches 42 through 50 enable the extension on BDW+. >>> >>> >>> 42-48 are >>> >>> Reviewed-by: Matt Turner <matts...@gmail.com> >>> >>> I don't understand the 64-bit CMP issue, so I'm booting a SKL to see how >>> fp64 works. >> >> >> Ah, I think I see. Because 16x doubles take up 4 registers, we have to >> emit two CMP instructions, one with 1Q and one with 2Q: >> >> cmp.ge.f0(8) null<1>DF g2.2<0,1,0>DF (abs)g11<4,4,1>DF { align1 >> 1Q }; >> cmp.ge.f0(8) null<1>DF g2.2<0,1,0>DF (abs)g7<4,4,1>DF { align1 2Q >> }; >> >> (from fs-op-add-double-double.shader_test) >> >> Makes sense to me. 49 is >> >> Reviewed-by: Matt Turner <matts...@gmail.com> > > Actually, it's something a little different. The splitting you're > talking about is handled just fine by curro's SIMD lowering pass. The > issue here is that if you don't specify a null destination register > (in which case this a moot point), CMP will always output the same > destination bitsize as the source bitsize. That is, if you compare two > registers with 8 doubles each (two SIMD8 registers each), the result > will take up two SIMD8 registers instead of one as you'd expect. I > couldn't track this down in the PRM, but I definitely remember having > to implement it and getting wrong results without it. The end result > is that you have to use a strided move to get the low 32 bits of each > 64-bit destination channel, which is what subscript() does. This > happens irrespective of whether you're compiling for SIMD8 or SIMD16. > Of course, in this case some backend optimizations have managed to > remove the destination register, so that's why you don't see it here, > but if you do something trickier, like store the result to a buffer, > the strided mov will be there. > > Anyways, that's what I remember of it... it's been a while.
Although, the example you gave has a bug, since the second CMP overwrites the result of the previous one... it looks like lower_simd_width isn't offsetting the flag register correctly when splitting the CMP. > >> >> _______________________________________________ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev >> _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev