Hi all, (This patch series is a bit long - some bits are incomplete (13-15 for Windows, 18 might need more thought generally, 22-24 are just a first attempt), but are included here to help with making use of the other patches.)
The goal here is to get to the point where we can do transcoding entirely on opaque GPU frames, with support for arbitrary operations we might want to do there. To that end, we first add more support for mapping between hardware devices (including creating new devices) and then add OpenCL support to use for arbitrary transformations on the GPU. OpenCL is tested on both Beignet and Intel Blob on gen9 (Skylake / Kaby Lake). They have different API for mapping, but otherwise work identically. It does not work with Clover (Mesa), due to lack of image support. Fun example, which demonstrates a lot of the features added by the series: ./avconv -y -threads 1 -init_hw_device vaapi=vadev:/dev/dri/renderD128 -hwaccel vaapi -hwaccel_device vadev -hwaccel_output_format vaapi -i in_1080i.mp4 -f image2 -r 1 -i overlays/%d.png -an -filter_hw_device opencl@vadev -filter_complex '[1:v]format=rgba,hwupload[x2]; [0:v]deinterlace_vaapi,scale_vaapi=1280:720,hwmap[x1]; [x1][x2]overlay_opencl=320:180,hwmap=derive_device=qsv:reverse=1' -c:v h264_qsv -r 60 -b 5M -maxrate 5M out.mp4 To break this down to explain what is going on: -init_hw_device vaapi=vadev:/dev/dri/renderD128 Create a new VAAPI device called "vadev" from "/dev/dri/renderD128". -hwaccel vaapi -hwaccel_device vadev -hwaccel_output_format vaapi -i in_1080i.mp4 First stream: decode with VAAPI on the "vadev" device we just created. -f image2 -r 1 -i overlays/%d.png Second stream: some PNGs with alpha to overlay, input at 1fps. -filter_hw_device opencl@vadev Create a new anonymous OpenCL device, derived from the VAAPI device "vadev", and use it for filtering. -filter_complex '[1:v]format=rgba,hwupload[x2]; Upload the overlay stream as RGBA to the filter device (i.e. the OpenCL one). [0:v]deinterlace_vaapi,scale_vaapi=1280:720,hwmap[x1] Deinterlace and scale the main stream, then map it to the filter device (i.e. from VAAPI to the derived OpenCL device). [x1][x2]overlay_opencl=320:180, Put the overlay onto the main stream at the given position (this requires that the two streams are different frames contexts on the same OpenCL device). hwmap=derive_device=qsv:reverse=1' Derive a new QSV device from the current device (i.e. OpenCL to QSV: this operation isn't possible directly, but it will succeed by deriving from the parent VAAPI device of the OpenCL device). Also reverse the mapping, so we allocate in the QSV domain and map back to OpenCL (this is required with QSV because mapping OpenCL -> QSV is not supported, it isn't needed with Beignet but does make things faster because of tiling magic). -c:v h264_qsv -r 60 -b 5M -maxrate 5M out.mp4 Encode the output with QSV at CBR 5Mbps. Patch breakdown: 1-2 Adds some support needed in the generic hwcontext code for device support. 3 Adds the generic hardware support for avconv. 4 Enables generic support for VAAPI in avconv. 5-6 (From wm4.) Adds hw_device_ctx support to VDPAU. 7 Enables generic support for VDPAU in avconv. 8 Adds an option to set/create a device for use in filter graphs. 9-10 Adds additional support in generic hwcontext for mapping frames contexts. 11-12 Adds hwmap support for device derivation and improved context mapping. 13-15 Adds derivation and mapping support for QSV <-> VAAPI. Also includes equivalent code to do the same for DXVA2, not even compiled but hopefully reflecting what it needs to do. 16 Enables generic support for QSV in avconv. QSV does not yet support hw_device_ctx initialisation, so the hwaccel code stays for now. 17-21 Adds OpenCL support, with mapping from VAAPI and QSV. 22-24 Adds some filtering code; incomplete, but enough to do fun things. Also accessible as a whole here <https://github.com/fhvwy/libav/tree/device>. In addition to looking at the patches and testing, thoughts on any or all of the following are welcome: * The device specification syntax. * Other mapping cases which will be interesting (Nvidia, DXVA, ?). Is there anything special about them which we should be taken into account now? * I'm not an OpenCL expert at all, so anything about the hwcontext setup. * OpenCL source code being included in lavfi. We need the source in the binary, as it gets compiled at run-time - currently I do this by preprocessor stringification, but something like objcopy might be cleaner (no idea how this would work on non-binutils platforms, though). Thanks, - Mark _______________________________________________ libav-devel mailing list [email protected] https://lists.libav.org/mailman/listinfo/libav-devel
