Michael Niedermayer: > On Tue, Aug 23, 2022 at 07:28:19PM +0200, Andreas Rheinhardt wrote: >> Michael Niedermayer: >>> On Mon, Aug 22, 2022 at 11:59:17PM +0200, Andreas Rheinhardt wrote: >>>> Andreas Rheinhardt: >>>>> Fixes FATE-failures with the the filter-2xbr filter-3xbr filter-4xbr >>>>> filter-ep2x filter-ep3x filter-hq2x filter-hq3x filter-hq4x >>>>> filter-paletteuse-bayer filter-paletteuse-bayer0 >>>>> filter-paletteuse-nodither and filter-paletteuse-sierra2_4a tests >>>>> when using 32bit x86 with CPUFLAGS ranging from "mmx+mmxext" to >>>>> "mmx+mmxext+sse+sse2+sse3" (the relevant function is only overwritten >>>>> when using SSSE3). >>>>> >>>>> Signed-off-by: Andreas Rheinhardt <andreas.rheinha...@outlook.com> >>>>> --- >>>>> libswscale/x86/rgb_2_rgb.asm | 1 + >>>>> 1 file changed, 1 insertion(+) >>>>> >>>>> diff --git a/libswscale/x86/rgb_2_rgb.asm b/libswscale/x86/rgb_2_rgb.asm >>>>> index c695c61d5c..76ca1eec03 100644 >>>>> --- a/libswscale/x86/rgb_2_rgb.asm >>>>> +++ b/libswscale/x86/rgb_2_rgb.asm >>>>> @@ -104,6 +104,7 @@ jge .end >>>>> jl .loop_simd >>>>> >>>>> .end: >>>>> + emms >>>>> RET >>>>> >>>>> >>>>> ;------------------------------------------------------------------------------ >>>> >>>> I'd really love if someone with x86 assembly skills could look over this >>>> trivial patch and confirm whether it is indeed correct. All I currently >>>> know is that is works for me. >>> >>> emms needs to be called between MMX and float code, as far outside of loops >>> as possible >>> that would suggest outside the for() loops in rgbToRgbWrapper() and any >>> other code using it. >> >> But there is another aspect that the above is missing: Namely that if >> emms_c() is put outside of MMX functions, then it will be called even >> when it is unnecessary. In this case it is unnecessary for all modern >> CPUs, as this function is overridden when SSSE3 is available. > > If you dont like that, > dont call it when its not needed or call it a few hundread times unnecessary > like your patch does. > or write only code that doesnt need emms > maybe there are more options ... >
If emms_c() is used as now outside of MMX functions, then a "dont call it when its not needed" would involve a check and would therefore still incur cost for users who don't use this. Also it is unclear how such a check would even look like given that one can use av_force_cpu_flags(). See also 55fc2c5a892c50feb1b9a8f55b74ec6594755ddb. This patch also only calls it a few hundred times unnecessarily if one runs this without SSSE3. CPUs without SSSE3 are ancient today. For the non-ancient CPUs, using emms_c() adds an EMMS. - Andreas _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".