https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106081
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Depends on| |96208
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
PR96208 is the SLP of non-grouped loads. We now can convert short -> double
and we get with the grouped load hacked and -march=znver3:
.L2:
vmovdqu (%rax), %ymm0
vpermq $27, -24(%rdi), %ymm10
addq $32, %rax
subq $32, %rdi
vpshufb %ymm7, %ymm0, %ymm0
vpermpd $85, %ymm10, %ymm9
vpermpd $170, %ymm10, %ymm8
vpermpd $255, %ymm10, %ymm6
vpmovsxwd %xmm0, %ymm1
vextracti128 $0x1, %ymm0, %xmm0
vbroadcastsd %xmm10, %ymm10
vcvtdq2pd %xmm1, %ymm11
vextracti128 $0x1, %ymm1, %xmm1
vpmovsxwd %xmm0, %ymm0
vcvtdq2pd %xmm1, %ymm1
vfmadd231pd %ymm10, %ymm11, %ymm5
vfmadd231pd %ymm9, %ymm1, %ymm2
vcvtdq2pd %xmm0, %ymm1
vextracti128 $0x1, %ymm0, %xmm0
vcvtdq2pd %xmm0, %ymm0
vfmadd231pd %ymm8, %ymm1, %ymm4
vfmadd231pd %ymm6, %ymm0, %ymm3
cmpq %rax, %rdx
jne .L2
that is, the 'short' data type forces a higher VF to us and the splat
codegen I hacked in is sub-optimal still.
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96208
[Bug 96208] non-grouped load can be SLP vectorized for 2-element vectors case