On 4/2/2018 8:33 PM, Carl Eugen Hoyos wrote: > 2018-04-02 23:26 GMT+02:00, Martin Vignali <martin.vign...@gmail.com>: > >> Around 20% faster (on a "benchmark cmd", who test pix_fmt conversion) >> (4.2s with the patch, 5.2s without) >> >> Pass fate test for me. >> >> Checkasm result : >> uyvytoyuv422_c: 14146.6 >> uyvytoyuv422_mmx: 13696.4 >> uyvytoyuv422_mmxext: 19395.9 > > Something looks wrong here... > > Carl Eugen
On a Haswell using GCC i get uyvytoyuv422_c: 44884.2 uyvytoyuv422_mmx: 15284.5 uyvytoyuv422_mmxext: 28656.5 uyvytoyuv422_sse2: 10921.8 uyvytoyuv422_avx: 10606.5 Martin is using a Clang version that is for some reason ignoring our attempts at disabling tree vectorization, so his C function is optimized with simd by the compiler, hence the good result. The mmxext version being slower than the mmx one seems however to be an existing issue in the tree, which we should probably deal with. Unless of course the test is wrong. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel