https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80820
Venkataramanan <venkataramanan.kumar at amd dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |venkataramanan.kumar at amd dot co | |m --- Comment #4 from Venkataramanan <venkataramanan.kumar at amd dot com> --- (In reply to Peter Cordes from comment #0) > gcc with -mtune=generic likes to bounce through memory when moving data from > integer registers to xmm for things like _mm_set_epi32. > > There are 3 related tuning issues here: > > * -mtune=haswell -mno-sse4 still uses one store/reload for _mm_set_epi64x. > > * -mtune=znver1 should definitely favour movd/movq instead of store/reload. > (Ryzen has 1 m-op movd/movq between vector and integer with 3c latency, > shorter than store-forwarding. All the reasons to favour store/reload on > other AMD uarches are gone.) > Yes for Ryzen, using direct move instructions should be better than using store-forwarding.