https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109944
Alexander Monakov <amonakov at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |amonakov at gcc dot gnu.org --- Comment #5 from Alexander Monakov <amonakov at gcc dot gnu.org> --- (In reply to Richard Biener from comment #3) > so we're building SImode elements in %xmm regs and then > unpack them - that's probably better than a series of > pinsrw due to dependences. For uarchs where grp->xmm > moves are costly it might be better to do > > pxor %xmm0, %xmm0 > pinsrw $0, (%rsi), %xmm0 > pinsrw $1, 32(%rsi), %xmm0 > > though? I'm afraid that is impossible, pinsrw will attempt to load 2 bytes, but only 1 is accessible (if at end of page).