https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93039
--- Comment #4 from rguenther at suse dot de <rguenther at suse dot de> --- On January 8, 2020 4:34:40 PM GMT+01:00, "amonakov at gcc dot gnu.org" <gcc-bugzi...@gcc.gnu.org> wrote: >https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93039 > >--- Comment #3 from Alexander Monakov <amonakov at gcc dot gnu.org> --- >> The question is for which CPUs is it actually faster to use SSE? > >In the context of chains where the source and the destination need to >be SSE >registers, pretty much all CPUs? Inter-unit moves typically have some >latency, >e.g. recent AMD (since Zen) and Intel (Skylake) have latency 3 for >sse<->gpr >moves (surprisingly though four generations prior to Skylake had >latency 1). >Older AMDs with shared fpu had even worse latencies. At the same time >SSE >integer ops have comparable latencies and throughput to gpr ones, so >generally >moving a chain to SSE ops isn't making it slower. Plus it helps with >register >pressure. > >When either the source or the destination of a chain is bound to a >general >register or memory, it's ok to continue doing it on general regs. But we need an extra load for the constant operand with an SSE op.