https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92265

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Oh, and there's also the case where

VPINSR[BWDQ]

takes a GPR (or memory) to insert int a XMM reg.  PINSRW is available
with SSE2, the B/Q/D variants with SSE4.1.

It's also only the non-zero lane inserts that require an extra move
if the above are not available.  There's memory move to upper half
for DImode memory sources as well.

Note the larger store might still be good to reduce needed store
bandwith and to avoid later STLF issues when a vector load follows.

But some targets have non-trivial move cost between register files
(not Intel though).

So the question is whether your example makes a difference in practice.

Reply via email to