https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80833

--- Comment #4 from Peter Cordes <peter at cordes dot ca> ---
I don't think it's worth anyone's time to implement this in 2017, but using MMX
regs for 64-bit store/load would be faster on really old CPUs that split 128b
vectors insns into two halves, like K8 and Pentium M.  Especially with
-mno-sse2 (e.g. Pentium3 compat) where movlps has a false dependency on the old
value of the xmm reg, but movq mm0 doesn't.  (No SSE2 means we can't MOVQ or
MOVSD to an XMM reg).

MMX is also a saving in code-size: one fewer prefix byte vs. SSE2 integer
instructions.  It's also another set of 8 registers for 32-bit mode.

But Skylake has lower throughput for the MMX versions of some instructions than
for the XMM version.  And SSE4 instructions like PEXTRD don't have MMX
versions, unlike SSSE3 and earlier (e.g. pshufb mm0, mm1 is available, and on
Conroe it's faster than the xmm version).

Reply via email to