Or at least for x64? Reasons being 1. I'm still seeing bug reports about missing emms here and there. 2. On Intel CPUs newer than Skylake, even if we use only the lower 64 bit of an XMM register, SSE2 instructions have better throughput than their MMX counterpart with the same latency. On AMD CPUs they have identical latency and throughput. 3. gcc has done the same for intrinsics (https://gcc.gnu.org/legacy-ml/gcc-patches/2019-02/msg00061.html)
Possible counter arguments (I don't think they apply to reasonably modern CPUs) 1. Most SSE2 instructions are one byte longer 2. Worse performance on old CPUs with 64-bit FPU (P4 or older, K8 or older) We can either rewrite (and replace, if we don't care about old hw at all) existing MMX procedures with SSE2 (mostly SIMD code targeting a small footprint that naturally fits into 64 bit), or do some trick to INIT_MMX so that it'll generate SSE2 code under x64. What do you think? -- Zuxy Beauty is truth, While truth is beauty. PGP KeyID: E8555ED6 _______________________________________________ ffmpeg-devel mailing list -- [email protected] To unsubscribe send an email to [email protected]
