Or at least for x64? Reasons being

1. I'm still seeing bug reports about missing emms here and there.
2. On Intel CPUs newer than Skylake, even if we use only the lower 64
bit of an XMM register, SSE2 instructions have better throughput than
their MMX counterpart with the same latency. On AMD CPUs they have
identical latency and throughput.
3. gcc has done the same for intrinsics
(https://gcc.gnu.org/legacy-ml/gcc-patches/2019-02/msg00061.html)

Possible counter arguments (I don't think they apply to reasonably modern CPUs)

1. Most SSE2 instructions are one byte longer
2. Worse performance on old CPUs with 64-bit FPU (P4 or older, K8 or older)

We can either rewrite (and replace, if we don't care about old hw at
all) existing MMX procedures with SSE2 (mostly SIMD code targeting a
small footprint that naturally fits into 64 bit), or do some trick to
INIT_MMX so that it'll generate SSE2 code under x64.

What do you think?

--
Zuxy
Beauty is truth,
While truth is beauty.
PGP KeyID: E8555ED6
_______________________________________________
ffmpeg-devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to