https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122095

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|X86_64                      |x86_64-*-*

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Hongtao Liu from comment #3)
> There could be STLF issue if there's following 16-byte load from s1. (it's
> expensive to have a global view of how s1 is used, but since the type is
> _m128i, there's probably 16-byte load for it).

I agree that GCCs emitted code is "safer" in this regard when it's not on a
latency critical path.

It's defnitely a missed optimization for -Os and possibly when you put
this into a loop over an array of __m128i.

We might want to look into lowering _mm_insert_epi8 and friends to GIMPLE
(or implement the intrinsics in terms of the vector extension).

Reply via email to