Hi, Andriy ----- Original Message ----- > From: "Andriy Gelman" <andriy.gel...@gmail.com> > To: "FFmpeg development discussions and patches" <ffmpeg-devel@ffmpeg.org> > Cc: xuju...@sjtu.edu.cn > Sent: Monday, December 23, 2019 12:50:48 AM > Subject: Re: [FFmpeg-devel] [PATCH v2 2/3] avfilter/vf_convolution: Add x86 > SIMD optimizations for filter_row()
> Xu, > > On Sun, 22. Dec 16:37, xuju...@sjtu.edu.cn wrote: >> From: Xu Jun <xuju...@sjtu.edu.cn> >> >> Read 16 elements from memory, shuffle and parallally compute 4 rows at a >> time, >> shuffle and parallelly write 16 results to memory. >> Performance improves about 15% compared to v1. >> >> Tested using this command: >> ./ffmpeg_g -s 1280*720 -pix_fmt yuv420p -i test.yuv -vf convolution="1 2 3 4 >> 5 6 >> 7 8 9:1 2 3 4 5 6 7 8 9:1 2 3 4 5 6 7 8 9:1 2 3 4 5 6 7 8 >> 9:1/45:1/45:1/45:1/45:1:2:3:4:row:row:row:row" -an -vframes 5000 -f null >> /dev/null -benchmark >> >> after patch: >> frame= 4317 fps=622 q=-0.0 Lsize=N/A time=00:02:52.68 bitrate=N/A speed=24.9x >> video:2260kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB >> muxing >> overhead: unknown >> bench: utime=20.539s stime=1.834s rtime=6.943s >> >> before patch(c version): >> frame= 4317 fps=306 q=-0.0 Lsize=N/A time=00:02:52.68 bitrate=N/A speed=12.2x >> video:2260kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB >> muxing >> overhead: unknown >> bench: utime=60.591s stime=1.787s rtime=14.100s >> >> Signed-off-by: Xu Jun <xuju...@sjtu.edu.cn> >> --- >> libavfilter/x86/vf_convolution.asm | 131 ++++++++++++++++++++++++++ >> libavfilter/x86/vf_convolution_init.c | 9 ++ >> 2 files changed, 140 insertions(+) >> mode change 100644 => 100755 libavfilter/x86/vf_convolution.asm >> >> diff --git a/libavfilter/x86/vf_convolution.asm >> b/libavfilter/x86/vf_convolution.asm >> old mode 100644 >> new mode 100755 >> index 754d4d1064..2a09374b00 >> --- a/libavfilter/x86/vf_convolution.asm >> +++ b/libavfilter/x86/vf_convolution.asm >> @@ -154,3 +154,134 @@ cglobal filter_3x3, 4, 15, 7, dst, width, rdiv, bias, >> matrix, ptr, c0, c1, c2, c >> INIT_XMM sse4 >> FILTER_3X3 >> %endif >> + > > Patch 2-3 are failing to build: > https://unofficial.patchwork-ffmpeg.org/project/FFmpeg/list/?series=26 > > -- > Andriy I'm sorry I haven't built patches independently. There seem to be some bugs in the dependency of the patches. I'll fix them in v3. Xu Jun _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".