On Fri, Sep 16, 2011 at 8:52 PM, Jakub Jelinek <ja...@redhat.com> wrote:
>> So, either we can fix this by adding >> reduc_{smin,smax,umin,umax}_v{32q,16h,8s,4d}i >> patterns (at that point I guess I should just macroize them together with >> the reduc_{smin,smax,umin,umax}_v{4sf,8sf,4df}) and handle the 4 32-byte >> integer modes also in ix86_expand_reduc, or come up with some new optab > > Here is a patch that does it this way and also moves the umaxmin expanders > one insn down to the right spot. > > I've noticed <sse2_avx2>_lshr<mode>3 insn was modelled incorrectly > for the 256-bit shift, because, as the documentation says, it > shifts each 128-bit lane separately, while it was modelled as V4DImode > shift (i.e. shifting each 64-bit chunk), and sse2_lshrv1ti3 was there > just for the 128-bit variant, not the 256-bit one. > > Regtested on x86_64-linux and i686-linux on SandyBridge, unfortunately > I don't have AVX2 emulator and thus AVX2 assembly was just eyeballed. > E.g. for the V16HImode reduction the difference with this patch is: > - vmovdqa %xmm0, %xmm1 > - vextracti128 $0x1, %ymm0, %xmm0 > - vpextrw $0, %xmm1, %eax > - vpextrw $1, %xmm1, %edx > - cmpw %ax, %dx > - cmovl %eax, %edx > - vpextrw $2, %xmm1, %eax > - cmpw %ax, %dx > - cmovl %eax, %edx > - vpextrw $3, %xmm1, %eax > - cmpw %ax, %dx > - cmovl %eax, %edx > - vpextrw $4, %xmm1, %eax > - cmpw %ax, %dx > - cmovl %eax, %edx > - vpextrw $5, %xmm1, %eax > - cmpw %ax, %dx > - cmovl %eax, %edx > - vpextrw $6, %xmm1, %eax > - cmpw %ax, %dx > - cmovl %eax, %edx > - vpextrw $7, %xmm1, %eax > - cmpw %ax, %dx > - cmovl %eax, %edx > - vpextrw $0, %xmm0, %eax > - cmpw %ax, %dx > - cmovl %eax, %edx > - vpextrw $1, %xmm0, %eax > - cmpw %ax, %dx > - cmovl %eax, %edx > - vpextrw $2, %xmm0, %eax > - cmpw %ax, %dx > - cmovl %eax, %edx > - vpextrw $3, %xmm0, %eax > - cmpw %ax, %dx > - cmovl %eax, %edx > - vpextrw $4, %xmm0, %eax > - cmpw %ax, %dx > - cmovl %eax, %edx > - vpextrw $5, %xmm0, %eax > - cmpw %ax, %dx > - cmovl %eax, %edx > - vpextrw $6, %xmm0, %eax > - cmpw %ax, %dx > - cmovl %eax, %edx > - vpextrw $7, %xmm0, %eax > - cmpw %ax, %dx > - cmovge %edx, %eax > + vperm2i128 $1, %ymm0, %ymm0, %ymm1 > + vpmaxsw %ymm1, %ymm0, %ymm0 > + vpsrldq $8, %ymm0, %ymm1 > + vpmaxsw %ymm1, %ymm0, %ymm0 > + vpsrldq $4, %ymm0, %ymm1 > + vpmaxsw %ymm1, %ymm0, %ymm0 > + vpsrldq $2, %ymm0, %ymm1 > + vpmaxsw %ymm1, %ymm0, %ymm0 > + vpextrw $0, %xmm0, %eax > > 2011-09-16 Jakub Jelinek <ja...@redhat.com> > > * config/i386/sse.md (VIMAX_AVX2): Change V4DI to V2TI. > (sse2_avx, sseinsnmode): Add V2TI. > (REDUC_SMINMAX_MODE): New mode iterator. > (reduc_smax_v4sf, reduc_smin_v4sf, reduc_smax_v8sf, > reduc_smin_v8sf, reduc_smax_v4df, reduc_smin_v4df): Remove. > (reduc_<code>_<mode>): New smaxmin and umaxmin expanders. > (sse2_lshrv1ti3): Rename to... > (<sse2_avx2>_lshr<mode>3): ... this. Use VIMAX_AVX2 mode > iterator. Move before umaxmin expanders. > * config/i386/i386.h (VALID_AVX256_REG_MODE, > SSE_REG_MODE_P): Accept V2TImode. > * config/i386/i386.c (ix86_expand_reduc): Handle V32QImode, > V16HImode, V8SImode and V4DImode. OK for mainline SVN. Thanks, Uros.