On 2017/4/12 5:00, Mark Thompson wrote: > On 11/04/17 12:26, Mark Thompson wrote: >> On 11/04/17 08:30, Jun Zhao wrote: >>> From 9bab458006369f427fa2f4c6248ee89329e81067 Mon Sep 17 00:00:00 2001 >>> From: Jun Zhao <jun.z...@intel.com> >>> Date: Tue, 11 Apr 2017 14:37:07 +0800 >>> Subject: [PATCH] hwcontext_vaapi: use the special UC copy for downloading >>> frames. >>> >>> used SSE4 UC function for copying image data from GPU mapped memory, >>> see >>> https://software.intel.com/en-us/articles/copying-accelerated-video-decode-frame-buffers >>> >>> before this change, VA-API HWAccel decoder copy image data from GPU >>> mapped memory used vaCreateImage/vaGetImage/av_frame_copy, now use >>> vaDeriveImage/av_image_copy_uc_from. >>> >>> decoding a 3000 frames 1080p h264 stream in Intel(R) Core(TM) >>> i5-6260U CPU @ 1.80GHz, the CPU usage and decode fps as follow: >>> >>> 1. Software decoder. >>> ./ffmpeg -i ./skyfall2-trailer.mp4 -f null /dev/null >>> >>> CPU: 80%, fps: 334fps >>> >>> 2a. vaCreateImage/vaGetImage/av_frame_copy >>> ./ffmpeg -hwaccel vaapi -vaapi_device /dev/dri/renderD128 -i >>> skyfall2-trailer.mp4 -f null /dev/null >>> >>> CPU: 12%, fps: 147fps >>> >>> 2b. vaDeriveImage/av_image_copy_uc_from >>> ./ffmpeg -hwaccel vaapi -vaapi_device /dev/dri/renderD128 -i >>> skyfall2-trailer.mp4 -f null /dev/null >>> >>> CPU: 23%, fps: 628fps >>> >>> Signed-off-by: Jun Zhao <jun.z...@intel.com> >>> --- >> >> This change was considered in libav when the UC copy function was introduced >> (<https://lists.libav.org/pipermail/libav-devel/2016-August/078826.html>, >> <https://lists.libav.org/pipermail/libav-devel/2016-August/078825.html>), >> but was not in the end applied. >> >> The reasons for this were: >> >> * It had much worse performance on the low-power cores - try your benchmark >> above on Braswell. > > Running on a Braswell N3700, input is 38072 frames of 1920x1080 H.264. > > No download at all: 520fps, 52s CPU > Before patch, 4 threads: 107fps, 237s CPU > Before patch, 1 thread: 90fps, 233s CPU > After patch, 4 threads: 30fps, 1294s CPU > After patch, 1 thread: 28fps, 1305s CPU > >
I will try to reproduce this in BSW. > - Mark > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel