On 4/2/2018 8:33 PM, Carl Eugen Hoyos wrote:
> 2018-04-02 23:26 GMT+02:00, Martin Vignali <martin.vign...@gmail.com>:
> 
>> Around 20% faster  (on a "benchmark cmd", who test pix_fmt conversion)
>> (4.2s with the patch, 5.2s without)
>>
>> Pass fate test for me.
>>
>> Checkasm result :
>> uyvytoyuv422_c: 14146.6
>> uyvytoyuv422_mmx: 13696.4
>> uyvytoyuv422_mmxext: 19395.9
> 
> Something looks wrong here...
> 
> Carl Eugen

On a Haswell using GCC i get

uyvytoyuv422_c: 44884.2
uyvytoyuv422_mmx: 15284.5
uyvytoyuv422_mmxext: 28656.5
uyvytoyuv422_sse2: 10921.8
uyvytoyuv422_avx: 10606.5

Martin is using a Clang version that is for some reason ignoring our
attempts at disabling tree vectorization, so his C function is optimized
with simd by the compiler, hence the good result.

The mmxext version being slower than the mmx one seems however to be an
existing issue in the tree, which we should probably deal with. Unless
of course the test is wrong.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Reply via email to