Hi,

On Tue, Jan 24, 2012 at 7:21 PM, Ronald S. Bultje <[email protected]> wrote:
> Also implement sse2/ssse3/avx versions.
> ---
>  libswscale/x86/input.asm          |  299 
> +++++++++++++++++++++++++++++++++++++
>  libswscale/x86/swscale_mmx.c      |   48 ++++---
>  libswscale/x86/swscale_template.c |  159 +-------------------
>  3 files changed, 328 insertions(+), 178 deletions(-)

First, noting a FIXME Jason mentioned earlier about the SSSE3 not
being Atom-friendly because it uses 4x pshufb. I may have to if(!atom)
that assignment but don't have a way of testing Atom performance to
see which is faster, SSSE3 or SSE2 (or a modified SSSE3 that uses 2x
pshufb + 4x punpckl/hbw).

Perf numbers (obe2) for rgb24 input functions:

./avconv -threads 1 -i ~/sintel_trailer_480p_vp8_vorbis.webm
-sws_flags +full_chroma_inp+full_chroma_int+bitexact -vf
format=rgb24,scale=800:600 -f md5 -nostats -vframes 500 -an -

c: chroma 6403, luma 3171
old mmx: chroma 3481, luma 1701
new mmx: chroma 3391, luma 1684
sse2: chroma 1691, luma 951
ssse3: chroma 1630, luma 848
avx: chroma 1495, luma 783

Full numbers:

c: chroma 6403, luma 3171
64380 decicycles in rgbtouv, 131072 runs, 0 skips
63944 decicycles in rgbtouv, 131072 runs, 0 skips
63882 decicycles in rgbtouv, 131072 runs, 0 skips
63946 decicycles in rgbtouv, 131072 runs, 0 skips
63973 decicycles in rgbtouv, 131072 runs, 0 skips
31695 decicycles in rgbtoy, 131052 runs, 20 skips
31658 decicycles in rgbtoy, 131038 runs, 34 skips
31632 decicycles in rgbtoy, 131049 runs, 23 skips
31667 decicycles in rgbtoy, 131060 runs, 12 skips
31921 decicycles in rgbtoy, 131030 runs, 42 skips

old mmx: chroma 3481, luma 1701
35031 decicycles in rgbtouv, 131015 runs, 57 skips
34654 decicycles in rgbtouv, 130995 runs, 77 skips
34688 decicycles in rgbtouv, 130997 runs, 75 skips
35027 decicycles in rgbtouv, 131008 runs, 64 skips
34673 decicycles in rgbtouv, 130988 runs, 84 skips
16907 decicycles in rgbtoy, 131037 runs, 35 skips
17163 decicycles in rgbtoy, 131029 runs, 43 skips
16968 decicycles in rgbtoy, 131036 runs, 36 skips
16873 decicycles in rgbtoy, 131022 runs, 50 skips
17160 decicycles in rgbtoy, 131034 runs, 38 skips

new mmx: chroma 3391, luma 1684
33948 decicycles in rgbtouv, 131051 runs, 21 skips
33910 decicycles in rgbtouv, 131054 runs, 18 skips
33883 decicycles in rgbtouv, 131050 runs, 22 skips
33899 decicycles in rgbtouv, 131041 runs, 31 skips
33905 decicycles in rgbtouv, 131043 runs, 29 skips
16845 decicycles in rgbtoy, 131060 runs, 12 skips
16807 decicycles in rgbtoy, 131058 runs, 14 skips
16837 decicycles in rgbtoy, 131054 runs, 18 skips
16879 decicycles in rgbtoy, 131055 runs, 17 skips
16821 decicycles in rgbtoy, 131061 runs, 11 skips

sse2: chroma 1691, luma 951
16893 decicycles in rgbtouv, 131027 runs, 45 skips
16877 decicycles in rgbtouv, 131030 runs, 42 skips
16883 decicycles in rgbtouv, 131025 runs, 47 skips
16923 decicycles in rgbtouv, 131026 runs, 46 skips
16953 decicycles in rgbtouv, 131030 runs, 42 skips
9534 decicycles in rgbtoy, 131044 runs, 28 skips
9543 decicycles in rgbtoy, 131046 runs, 26 skips
9477 decicycles in rgbtoy, 131045 runs, 27 skips
9510 decicycles in rgbtoy, 131048 runs, 24 skips
9504 decicycles in rgbtoy, 131047 runs, 25 skips

ssse3: chroma 1630, luma 848
16315 decicycles in rgbtouv, 131035 runs, 37 skips
16285 decicycles in rgbtouv, 131015 runs, 57 skips
16304 decicycles in rgbtouv, 131027 runs, 45 skips
16282 decicycles in rgbtouv, 131024 runs, 48 skips
16312 decicycles in rgbtouv, 131014 runs, 58 skips
8477 decicycles in rgbtoy, 131034 runs, 38 skips
8473 decicycles in rgbtoy, 131033 runs, 39 skips
8488 decicycles in rgbtoy, 131039 runs, 33 skips
8467 decicycles in rgbtoy, 131049 runs, 23 skips
8491 decicycles in rgbtoy, 131051 runs, 21 skips

avx: chroma 1495, luma 783
14942 decicycles in rgbtouv, 131034 runs, 38 skips
14951 decicycles in rgbtouv, 131029 runs, 43 skips
14960 decicycles in rgbtouv, 131030 runs, 42 skips
14946 decicycles in rgbtouv, 131014 runs, 58 skips
14942 decicycles in rgbtouv, 131027 runs, 45 skips
7814 decicycles in rgbtoy, 131041 runs, 31 skips
7828 decicycles in rgbtoy, 131049 runs, 23 skips
7824 decicycles in rgbtoy, 131046 runs, 26 skips
7831 decicycles in rgbtoy, 131048 runs, 24 skips
7834 decicycles in rgbtoy, 131041 runs, 31 skips
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to