https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103797
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Version|unknown |12.0
Resolution|--- |FIXED
--- Comment #20 from Richard Biener <rguenth at gcc dot gnu.org> ---
The division is now vectorized, your short testcase produces
t:
.LFB0:
.cfi_startproc
movss f(%rip), %xmm4
movss test+8(%rip), %xmm3
movq test(%rip), %xmm0
mulss %xmm4, %xmm3
movaps %xmm4, %xmm1
shufps $0xe0, %xmm1, %xmm1
mulps %xmm1, %xmm0
movhps .LC0(%rip), %xmm1
rcpps %xmm1, %xmm2
sqrtss %xmm3, %xmm3
mulps %xmm2, %xmm1
sqrtps %xmm0, %xmm0
divss %xmm4, %xmm3
mulps %xmm2, %xmm1
addps %xmm2, %xmm2
subps %xmm1, %xmm2
mulps %xmm2, %xmm0
movlps %xmm0, test(%rip)
movss %xmm3, test+8(%rip)
ret