Hi, On Tue, Jan 24, 2012 at 7:21 PM, Ronald S. Bultje <[email protected]> wrote: > Also implement sse2/ssse3/avx versions. > --- > libswscale/x86/input.asm | 299 > +++++++++++++++++++++++++++++++++++++ > libswscale/x86/swscale_mmx.c | 48 ++++--- > libswscale/x86/swscale_template.c | 159 +------------------- > 3 files changed, 328 insertions(+), 178 deletions(-)
First, noting a FIXME Jason mentioned earlier about the SSSE3 not being Atom-friendly because it uses 4x pshufb. I may have to if(!atom) that assignment but don't have a way of testing Atom performance to see which is faster, SSSE3 or SSE2 (or a modified SSSE3 that uses 2x pshufb + 4x punpckl/hbw). Perf numbers (obe2) for rgb24 input functions: ./avconv -threads 1 -i ~/sintel_trailer_480p_vp8_vorbis.webm -sws_flags +full_chroma_inp+full_chroma_int+bitexact -vf format=rgb24,scale=800:600 -f md5 -nostats -vframes 500 -an - c: chroma 6403, luma 3171 old mmx: chroma 3481, luma 1701 new mmx: chroma 3391, luma 1684 sse2: chroma 1691, luma 951 ssse3: chroma 1630, luma 848 avx: chroma 1495, luma 783 Full numbers: c: chroma 6403, luma 3171 64380 decicycles in rgbtouv, 131072 runs, 0 skips 63944 decicycles in rgbtouv, 131072 runs, 0 skips 63882 decicycles in rgbtouv, 131072 runs, 0 skips 63946 decicycles in rgbtouv, 131072 runs, 0 skips 63973 decicycles in rgbtouv, 131072 runs, 0 skips 31695 decicycles in rgbtoy, 131052 runs, 20 skips 31658 decicycles in rgbtoy, 131038 runs, 34 skips 31632 decicycles in rgbtoy, 131049 runs, 23 skips 31667 decicycles in rgbtoy, 131060 runs, 12 skips 31921 decicycles in rgbtoy, 131030 runs, 42 skips old mmx: chroma 3481, luma 1701 35031 decicycles in rgbtouv, 131015 runs, 57 skips 34654 decicycles in rgbtouv, 130995 runs, 77 skips 34688 decicycles in rgbtouv, 130997 runs, 75 skips 35027 decicycles in rgbtouv, 131008 runs, 64 skips 34673 decicycles in rgbtouv, 130988 runs, 84 skips 16907 decicycles in rgbtoy, 131037 runs, 35 skips 17163 decicycles in rgbtoy, 131029 runs, 43 skips 16968 decicycles in rgbtoy, 131036 runs, 36 skips 16873 decicycles in rgbtoy, 131022 runs, 50 skips 17160 decicycles in rgbtoy, 131034 runs, 38 skips new mmx: chroma 3391, luma 1684 33948 decicycles in rgbtouv, 131051 runs, 21 skips 33910 decicycles in rgbtouv, 131054 runs, 18 skips 33883 decicycles in rgbtouv, 131050 runs, 22 skips 33899 decicycles in rgbtouv, 131041 runs, 31 skips 33905 decicycles in rgbtouv, 131043 runs, 29 skips 16845 decicycles in rgbtoy, 131060 runs, 12 skips 16807 decicycles in rgbtoy, 131058 runs, 14 skips 16837 decicycles in rgbtoy, 131054 runs, 18 skips 16879 decicycles in rgbtoy, 131055 runs, 17 skips 16821 decicycles in rgbtoy, 131061 runs, 11 skips sse2: chroma 1691, luma 951 16893 decicycles in rgbtouv, 131027 runs, 45 skips 16877 decicycles in rgbtouv, 131030 runs, 42 skips 16883 decicycles in rgbtouv, 131025 runs, 47 skips 16923 decicycles in rgbtouv, 131026 runs, 46 skips 16953 decicycles in rgbtouv, 131030 runs, 42 skips 9534 decicycles in rgbtoy, 131044 runs, 28 skips 9543 decicycles in rgbtoy, 131046 runs, 26 skips 9477 decicycles in rgbtoy, 131045 runs, 27 skips 9510 decicycles in rgbtoy, 131048 runs, 24 skips 9504 decicycles in rgbtoy, 131047 runs, 25 skips ssse3: chroma 1630, luma 848 16315 decicycles in rgbtouv, 131035 runs, 37 skips 16285 decicycles in rgbtouv, 131015 runs, 57 skips 16304 decicycles in rgbtouv, 131027 runs, 45 skips 16282 decicycles in rgbtouv, 131024 runs, 48 skips 16312 decicycles in rgbtouv, 131014 runs, 58 skips 8477 decicycles in rgbtoy, 131034 runs, 38 skips 8473 decicycles in rgbtoy, 131033 runs, 39 skips 8488 decicycles in rgbtoy, 131039 runs, 33 skips 8467 decicycles in rgbtoy, 131049 runs, 23 skips 8491 decicycles in rgbtoy, 131051 runs, 21 skips avx: chroma 1495, luma 783 14942 decicycles in rgbtouv, 131034 runs, 38 skips 14951 decicycles in rgbtouv, 131029 runs, 43 skips 14960 decicycles in rgbtouv, 131030 runs, 42 skips 14946 decicycles in rgbtouv, 131014 runs, 58 skips 14942 decicycles in rgbtouv, 131027 runs, 45 skips 7814 decicycles in rgbtoy, 131041 runs, 31 skips 7828 decicycles in rgbtoy, 131049 runs, 23 skips 7824 decicycles in rgbtoy, 131046 runs, 26 skips 7831 decicycles in rgbtoy, 131048 runs, 24 skips 7834 decicycles in rgbtoy, 131041 runs, 31 skips _______________________________________________ libav-devel mailing list [email protected] https://lists.libav.org/mailman/listinfo/libav-devel
