Quoting wm4 (2017-02-19 14:48:36) > On Sun, 19 Feb 2017 11:09:49 +0100 > Anton Khirnov <[email protected]> wrote: > > > Some parts of the code are based on a patch by > > Timo Rothenpieler <[email protected]> > > --- > > Compared to the ffmpeg patch which implements cuvid as a separate decoder > > using > > the higher-level parser API (nvcuvid.h), I did it as a classic hwaccel using > > the lower-level decoder API (cuviddec.h). > > IMO, this has a number of advantages: > > - integrates much better with the existing acceleration infrastructure/APIs > > - supports stream parameters changes > > - the code is much simpler > > - software fallback > > - various features from h264dec, such as handling weird invalid streams or > > exporting metadata from SEIs > > > > > > > One question to be resolved is retrieving the frames. The way the API works > > is > > that the decoder maintains and internal pool of frames, to which the caller > > refers by their indices. When you want the data, you map the frame, which > > allows > > you to copy its contents to a normal CUDA frame. To get optimal performance, > > this map+copy needs to be delayed wrt decoding by a few frames, so the > > question > > is how this should be done. The options I see are: > > - introduce a new pixel format, AV_PIX_FMT_CUVID, which wraps the frame > > index > > and allows transfer to CUDA via av_hwframe_transfer_data(). Then either > > * Return those PIX_FMT_CUVID frames to the caller and let them do the > > copy > > manually. This is most flexible, but more work for the caller and might > > mean synchronization problems, so we'd need to add locks (perhaps to > > the > > CUVID frames context). > > * Handle delay+map+copy somewhere else in lavc. The question is where > > would the right place be. Janne suggested at FOSDEM to add a dummy > > decoder, > > h264_cuvid wrapping h264dec, which would do the delay and copy. That > > should > > work, but isn't very elegant. > > - we could also add some sort of a "postprocess" stage to AVHWaccel, run > > before > > returning a frame from decode(), or perhaps invoked separately by the > > lavc > > generic code. > > This issue might be relevant to other future hwaccels as well (VT?), so > > ideally > > the solution would be generic. Comments and further suggestions very much > > welcome. > > What is with all this complexity? Is this about the final read-back if > you want to decode to system RAM? In this case, let it the API user do, > like any decent API user already does, and which your first point > suggests. (This means you need to hack avconv.c.) Not sure why "locks" > would be needed for this.
No, this is about reading the frame from the internal decoder pool into user-managed GPU memory. -- Anton Khirnov _______________________________________________ libav-devel mailing list [email protected] https://lists.libav.org/mailman/listinfo/libav-devel
