Hi Everyone, Today, I was analyzing memcpy duration in FFmpeg. I noticed that it is taking longer time compared to an optimized SSE, SSE2, MMX, MMX2, AVX or AVX2 based memcpy operation.
I tried march=corei7-avx2 compiled FFmpeg version, it does not change the duration of memcpy operation. I also folowed https://trac.ffmpeg.org/wiki/CompilationGuide#PerformanceTips .Same result. In addition, I tried gcc 6.2 if gcc if gcc is not selecting the correct flag. Same result again. This memcpy operations effect the fps decoding (and probably encoding) rates. In a case that uyvy422 to p010 3840x2160 unscaled convertion in rawvideo, fps rate increased from 44 fps to 52 fps on a Xeon E5 2630 v4. Do I miss anything when compiling FFmpeg for AVX2 or other flag optimised, or there need a fix in FFmpeg to direct some (or all) memcpy operations to a inherited memcpy operation which can decide flag for optimisation ? Or there is no such need and I am on a wrong path ? (As a side note, FFmpeg works performance on i7 Extreme cores compared to Xeon v4 processors.) Kind Regards, Ali KIZIL _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel