Hi,

On Sat, Jan 7, 2012 at 9:06 PM, Ronald S. Bultje <[email protected]> wrote:
> Also implement SSE2/AVX variants.
> ---
>  libswscale/Makefile               |    3 +-
>  libswscale/x86/input.asm          |  249 
> +++++++++++++++++++++++++++++++++++++
>  libswscale/x86/swscale_mmx.c      |   87 +++++++++++++
>  libswscale/x86/swscale_template.c |  163 ------------------------
>  4 files changed, 338 insertions(+), 164 deletions(-)
>  create mode 100644 libswscale/x86/input.asm
[..]

I measured this. The interesting thing is that vpand appears to hurt,
not help, so I'll probably get rid of that and thus of the yuyvToY_avx
function. In all other cases we see small speedups compared to the old
inline asm for MMX code, and great speedups because of SSE2 and AVX.

Numbers:

yuyv, avx:
luma: 353.84 (slower???)
chroma: 331.52

yuyv, sse2
luma: 351.86
chroma: 344.76

yuyv, mmx
luma: 459.36
chroma: 899.52

yuyv, old mmx
luma: 491.98
chroma: 901.00

uyvy, avx
luma: 351.34 (same code as sse2 version, so should be same)
chroma: 337.38

uyvy, sse2
luma: 351.72
chroma: 346.14

uyvy, mmx
luma: 507.98
chroma: 637.32

uyvy, old mmx
luma: 513.04
chroma: 663.64

nv12, avx
chroma: 259.76

nv12, sse2
chroma: 275.96

nv12, mmx
chroma: 384.12

nv12, old mmx
chroma: 387.90

Ronald
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to