On Tue, Apr 3, 2018 at 2:10 AM, James Almer <jamr...@gmail.com> wrote: > On 4/2/2018 8:33 PM, Carl Eugen Hoyos wrote: >> 2018-04-02 23:26 GMT+02:00, Martin Vignali <martin.vign...@gmail.com>: >> >>> Around 20% faster (on a "benchmark cmd", who test pix_fmt conversion) >>> (4.2s with the patch, 5.2s without) >>> >>> Pass fate test for me. >>> >>> Checkasm result : >>> uyvytoyuv422_c: 14146.6 >>> uyvytoyuv422_mmx: 13696.4 >>> uyvytoyuv422_mmxext: 19395.9 >> >> Something looks wrong here... >> >> Carl Eugen > > On a Haswell using GCC i get > > uyvytoyuv422_c: 44884.2 > uyvytoyuv422_mmx: 15284.5 > uyvytoyuv422_mmxext: 28656.5 > uyvytoyuv422_sse2: 10921.8 > uyvytoyuv422_avx: 10606.5 > > Martin is using a Clang version that is for some reason ignoring our > attempts at disabling tree vectorization, so his C function is optimized > with simd by the compiler, hence the good result. > > The mmxext version being slower than the mmx one seems however to be an > existing issue in the tree, which we should probably deal with. Unless > of course the test is wrong.
Its mmx, dealing with it would probably entail just deleting it. Can leave the ordinary mmx and remove the ext version, or perhaps just both. - Hendrik _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel