https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83479
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|WAITING |NEW
CC| |jakub at gcc dot gnu.org,
| |rguenth at gcc dot gnu.org
--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed. -fschedule-insns helps in both cases, so does -fno-tree-ter.
Just looking at the GIMPLE the AVX512 case is somewhat weird:
__m512d v8;
double _59;
vector(2) double _303;
...
_59 = BIT_FIELD_REF <v8_187, 64, 384>;
_303 = {_59};
_305 = __builtin_ia32_broadcastsd512 (_303, __Y_304(D), 255);
_302 = __builtin_ia32_sqrtpd512_mask (_305, __Y_301(D), -1, 4);
but it seems this is how _mm512_set1_pd works:
extern __inline __m512d
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
_mm512_set1_pd (double __A)
{
return (__m512d) __builtin_ia32_broadcastsd512 (__extension__
(__v2df) { __A, },
(__v8df)
_mm512_undefined_pd (),
(__mmask8) -1);
}
given we now have VEC_DUPLICATE_EXPR it would be nice to open-code
those builtins somehow (or for GCC 9).