https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93039

--- Comment #3 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
> The question is for which CPUs is it actually faster to use SSE?

In the context of chains where the source and the destination need to be SSE
registers, pretty much all CPUs? Inter-unit moves typically have some latency,
e.g. recent AMD (since Zen) and Intel (Skylake) have latency 3 for sse<->gpr
moves (surprisingly though four generations prior to Skylake had latency 1).
Older AMDs with shared fpu had even worse latencies. At the same time SSE
integer ops have comparable latencies and throughput to gpr ones, so generally
moving a chain to SSE ops isn't making it slower. Plus it helps with register
pressure.

When either the source or the destination of a chain is bound to a general
register or memory, it's ok to continue doing it on general regs.

Reply via email to