https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83479
--- Comment #9 from rguenther at suse dot de <rguenther at suse dot de> --- On Tue, 19 Dec 2017, jakub at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83479 > > --- Comment #8 from Jakub Jelinek <jakub at gcc dot gnu.org> --- > (In reply to Richard Biener from comment #7) > > but it seems this is how _mm512_set1_pd works: > > > > extern __inline __m512d > > __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > > _mm512_set1_pd (double __A) > > { > > return (__m512d) __builtin_ia32_broadcastsd512 (__extension__ > > (__v2df) { __A, }, > > (__v8df) > > _mm512_undefined_pd (), > > (__mmask8) -1); > > } > > > > given we now have VEC_DUPLICATE_EXPR it would be nice to open-code > > those builtins somehow (or for GCC 9). > > The builtin handles the masking and zeroing/previous value, which is something > the generic code can't easily handle. But we could in backend gimple folder > fold those into VEC_DUPLICATE_EXPR or VEC_PERM_EXPR with all zeros if the mask > is all ones. Yeah, but this is _mm512_set1_pd, not some masking intrinsic. We'd need to think about how the generic vector extension can be used to do a splat of course. Apart from just writing return (__m512d) { __A, __A, __A, ... }; I suppose we expected that combine will never be able to match this to the broadcast instruction which presumambly only exists with all the bells and whistles.
