Hi, On Sat, Jan 7, 2012 at 9:06 PM, Ronald S. Bultje <[email protected]> wrote: > Also implement SSE2/AVX variants. > --- > libswscale/Makefile | 3 +- > libswscale/x86/input.asm | 249 > +++++++++++++++++++++++++++++++++++++ > libswscale/x86/swscale_mmx.c | 87 +++++++++++++ > libswscale/x86/swscale_template.c | 163 ------------------------ > 4 files changed, 338 insertions(+), 164 deletions(-) > create mode 100644 libswscale/x86/input.asm [..]
I measured this. The interesting thing is that vpand appears to hurt, not help, so I'll probably get rid of that and thus of the yuyvToY_avx function. In all other cases we see small speedups compared to the old inline asm for MMX code, and great speedups because of SSE2 and AVX. Numbers: yuyv, avx: luma: 353.84 (slower???) chroma: 331.52 yuyv, sse2 luma: 351.86 chroma: 344.76 yuyv, mmx luma: 459.36 chroma: 899.52 yuyv, old mmx luma: 491.98 chroma: 901.00 uyvy, avx luma: 351.34 (same code as sse2 version, so should be same) chroma: 337.38 uyvy, sse2 luma: 351.72 chroma: 346.14 uyvy, mmx luma: 507.98 chroma: 637.32 uyvy, old mmx luma: 513.04 chroma: 663.64 nv12, avx chroma: 259.76 nv12, sse2 chroma: 275.96 nv12, mmx chroma: 384.12 nv12, old mmx chroma: 387.90 Ronald _______________________________________________ libav-devel mailing list [email protected] https://lists.libav.org/mailman/listinfo/libav-devel
