> -----Original Message----- > From: ffmpeg-devel [mailto:ffmpeg-devel-boun...@ffmpeg.org] On Behalf Of > Timo Rothenpieler > Sent: Monday, March 25, 2019 6:31 PM > To: ffmpeg-devel@ffmpeg.org > Subject: Re: [FFmpeg-devel] [PATCH][FFmpeg-devel v2] Add GPU accelerated > video crop filter > > On 25/03/2019 09:27, Tao Zhang wrote: > >>> Hi, > >>> > >>> Timo and Mark and I have been discussing this, and we think the right > >>> thing to do is add support to vf_scale_cuda to respect the crop > >>> properties on an input AVFrame. Mark posted a patch to vf_crop to > >>> ensure that the properties are set, and then the scale filter should > >>> respect those properties if they are set. You can look at > >>> vf_scale_vaapi for how the properties are read, but they will require > >>> explicit handling to adjust the src dimensions passed to the scale > >>> filter. > > Maybe a little not intuitive to users. > >>> > >>> This will be a more efficient way of handling crops, in terms of total > >>> lines of code and also allowing crop/scale with one less copy. > >>> > >>> I know this is quite different from the approach you've taken here, and > >>> we appreciate the work you've done, but it should be better overall to > >>> implement this integrated method. > >> Hi Philip, > >> > >> Glad to hear you guys had discussion on this. As I am also considering the > problem, I have some questions about your idea. > >> So, what if user did not insert a scale_cuda after crop filter? Do you > >> plan to > automatically insert scale_cuda or just ignore the crop? > >> What if user want to do crop,transpose_cuda,scale_cuda? So we also need > to handle crop inside transpose_cuda filter? > > > > I have the same question. > Ideally, scale_cuda should be auto-inserted at the required places once > it works that way. > Otherwise it seems pointless to me if the user still has to manually > insert it after the generic filters setting metadata. Agree.
> > For that reason it should also still support getting its parameters > passed directly as a fallback, and potentially even expose multiple > filter names, so crop_cuda and transpose_cuda are still visible, but > ultimately point to the same filter code. > > We have a transpose_npp, right now, but with libnpp slowly being on its > way out, transpose_cuda is needed, and ultimately even a format_cuda > filter, since right now scale_npp is the only filter that can convert > pixel formats on the hardware. > I'd also like to see scale_cuda to support a few more interpolation > algorithms, but that's not very important for now. > > All this functionality can be in the same filter, which is scale_cuda. > The point of that is that it avoids needless expensive frame copies as > much as possible. For crop/transpose, these are just some copy-like kernel. May be a good idea to merge with other kernels. But I am not sure how much overall performance gain we would get for a transcoding pipeline. And merging all the things together may make code very complex. For example, a crop+scale or crop+transpose may be easy to merge. But a crop+transpose+scale or crop+transpose+scale+format will be more complex. I want to share some of my experience on developing opencl scale filter( https://patchwork.ffmpeg.org/patch/11910/ ). I tried to merge scale and format-convert in one single OpenCL kernel. But I failed to make the code clean after supporting interpolation method like bicubic, so I plan to separate them in two kernels these days. And my experiments on scale_opencl show that merging scale with format-convert may not always get benefit. For example, for 1080p scale-down, merging these two operations together is about 10% faster (for decode+scale), but for 4K input, merging two kernels make it slower. My guess is different planes may compete the limited GPU cache. For scale-only, we can do it plane by plane, but for format-convert you have to read all the input planes and write all output planes at the same time. This is just my guess, I have not root-caused what is the real reason. But I think keeping scale and format-convert in separate kernel function seems better. I am also thinking about this issue other way, whether it is possible that we simple do the needed copy in crop/transpose and try to optimize off one filter if they are neighbors and pass the options to the other when configuring the filter pipeline? Definitely I am really interested to see the work you described happen in FFmpeg. Thanks! Ruiling _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".