Hello again, while I successfully compiled Cin-GG with opencl support via 
ffmpeg - I have no hardware to test it (in theory I can install CPU-only OpenCL 
implementation, but then it will be sloooow ...)

Still, I found some mystery I want to share:

ENHANCED VIDEO
PROCESSING WITH
FFMPEG AND OPENCL
  
Kelvin Phan – Massey University.
2016

Unfortunately it seems main page where this file was hosted is down at the 
moment, and archive.org has no copy of it, too ...

so, Google cache was used:

http://webcache.googleusercontent.com/search?q=cache:H-ISEoUS43UJ:https://kelvinphan.weebly.com/uploads/2/6/9/3/26933593/kelvin_phan_-_master%2560s_report.pdf%2BEnhanced+Video+Processing+with+FFMPEG+and+OpenCL&hl=ru&ct=clnk

whole 45 pages of it a bit too big for email text (body), so I just copy 
conclusion at very end:

========

1. Introduction:
In the old implementation (OpenCL version 1.0 to 1.2), the host and the device 
couldn’t share the
same address space. To transfer memory data between host and device, buffers 
need to be created.
One buffer for transferring data from host to device and another buffer for 
transferring data back from
device to host. An offset needs to be passed to and from devices for accessing 
a location within a
buffer such as accessing a region within an image. A host memory pointer cannot 
be used on the
device.
In OpenCL version 2.0, communication between host and device does not need to 
have a copied
buffer. Shared Virtual Memory (SVM), host and device may use the same virtual 
address space. This
address space can be effectively used to share virtual pointers created in this 
space. The data structures
which are based on those memory pointers can be shared between host and device.

[...]

• OS: Ubuntu 14.04 (64 bits)
• CPU: AMD Phenom(tm) II X4 955 Processor × 4
• GPU: AMD Radeon (TM) R7 360 Series

[...]

On the other hand, the same kind of result was expected to happen with un-sharp 
filter, but
they was not. It is true that the filter with OpenCL enable gave better 
performing comparing
without OpenCL enable, but the performing was not even nearly two times speed 
up as we see
with de-shake filter. The possible reason for this is that the program took a 
lot more time to
transfer the data between host buffers and device buffers, and there are a lot 
of transferring
which needs to do.
==============

hm, so, it was not as big win as author hoped ..yet, those filters appeared in 
mainline ffmpeg:

http://ffmpeg.org/pipermail/ffmpeg-devel/2017-November/219828.html
[FFmpeg-devel] [PATCH 00/15] OpenCL infrastructure, filters

-----------------
Changes since the last time this was posted:
* Add unsharp filter (to replace existing unsharp).
* Remove old experimental API.
* Miscellaneous fixes.

Now also tested with AMD OpenCL on Windows (DXVA2 mapping works nicely, D3D11 
does not because it wants the Intel extension for NV12 support).

Thanks,

- Mark


Silly example using everything (for i965 VAAPI + Beignet):

./ffmpeg_g -y -init_hw_device vaapi=va:/dev/dri/renderD128 -init_hw_device 
opencl=ocl at va -hwaccel vaapi -hwaccel_device va -hwaccel_output_format vaapi 
-i in.mp4 -f image2 -r 1 -i overlays/%d.png -an -filter_hw_device ocl 
-filter_complex '[1:v]format=yuva420p,hwupload[x2]; 
[0:v]scale_vaapi=1280:720:yuv420p,hwmap[x1]; 
[x1][x2]overlay_opencl=0:0,program_opencl=test.cl:rotate_image,unsharp_opencl=lx=17:ly=17:la=5,hwmap=derive_device=vaapi:reverse=1,scale_vaapi=1280:720:nv12'
 -c:v h264_vaapi -frames:v 1000 out.mp4

test.cl:

__kernel void rotate_image(__write_only image2d_t dst,
                           __read_only  image2d_t src,
                           unsigned int index)
{
  const sampler_t sampler = (CLK_NORMALIZED_COORDS_FALSE |
                             CLK_FILTER_LINEAR);

  float angle = (float)index / 100;

  float2 dst_dim = convert_float2(get_image_dim(dst));
  float2 src_dim = convert_float2(get_image_dim(src));

  float2 dst_cen = dst_dim / 2;
  float2 src_cen = src_dim / 2;

  int2   dst_loc = (int2)(get_global_id(0), get_global_id(1));

  float2 dst_pos = convert_float2(dst_loc) - dst_cen;
  float2 src_pos = {
    cos(angle) * dst_pos.x - sin(angle) * dst_pos.y,
    sin(angle) * dst_pos.x + cos(angle) * dst_pos.y
  };
  src_pos = src_pos * src_dim / dst_dim;

  float2 src_loc = src_pos + src_cen;

  if (src_loc.x < 0         || src_loc.y < 0 ||
      src_loc.x > src_dim.x || src_loc.y > src_dim.y)
    write_imagef(dst, dst_loc, 0.5);
  else
    write_imagef(dst, dst_loc, read_imagef(src, sampler, src_loc));
}
-----------

and another patch was added in 2018, for tonemapping:

https://patchwork.ffmpeg.org/patch/9032/
[FFmpeg-devel] lavfi: add opencl tonemap filter.

--------------
This filter does HDR(HDR10/HLG) to SDR conversion with tone-mapping.

An example command to use this filter with vaapi codecs:
FFMPEG -init_hw_device vaapi=va:/dev/dri/renderD128 -init_hw_device \
opencl=ocl@va -hwaccel vaapi -hwaccel_device va -hwaccel_output_format \
vaapi -i INPUT -filter_hw_device ocl -filter_complex \
'[0:v]hwmap,tonemap_opencl=t=bt2020:tonemap=linear:format=p010[x1]; \
[x1]hwmap=derive_device=vaapi:reverse=1' -c:v hevc_vaapi -profile 2 OUTPUT

-------------------

so, their command-line examples quite complex, and not sure if libavcodec can 
construct such pipeline automagically in case it was called by external code 
(cinelerra-GG in our case)......


-- 
Cin mailing list
[email protected]
https://lists.cinelerra-gg.org/mailman/listinfo/cin

Reply via email to