https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93039

--- Comment #4 from rguenther at suse dot de <rguenther at suse dot de> ---
On January 8, 2020 4:34:40 PM GMT+01:00, "amonakov at gcc dot gnu.org"
<gcc-bugzi...@gcc.gnu.org> wrote:
>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93039
>
>--- Comment #3 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
>> The question is for which CPUs is it actually faster to use SSE?
>
>In the context of chains where the source and the destination need to
>be SSE
>registers, pretty much all CPUs? Inter-unit moves typically have some
>latency,
>e.g. recent AMD (since Zen) and Intel (Skylake) have latency 3 for
>sse<->gpr
>moves (surprisingly though four generations prior to Skylake had
>latency 1).
>Older AMDs with shared fpu had even worse latencies. At the same time
>SSE
>integer ops have comparable latencies and throughput to gpr ones, so
>generally
>moving a chain to SSE ops isn't making it slower. Plus it helps with
>register
>pressure.
>
>When either the source or the destination of a chain is bound to a
>general
>register or memory, it's ok to continue doing it on general regs.

But we need an extra load for the constant operand with an SSE op.

Reply via email to