Quoting wm4 (2017-02-19 14:48:36)
> On Sun, 19 Feb 2017 11:09:49 +0100
> Anton Khirnov <[email protected]> wrote:
> 
> > Some parts of the code are based on a patch by
> > Timo Rothenpieler <[email protected]>
> > ---
> > Compared to the ffmpeg patch which implements cuvid as a separate decoder 
> > using
> > the higher-level parser API (nvcuvid.h), I did it as a classic hwaccel using
> > the lower-level decoder API (cuviddec.h).
> > IMO, this has a number of advantages:
> >  - integrates much better with the existing acceleration infrastructure/APIs
> >  - supports stream parameters changes
> >  - the code is much simpler
> >  - software fallback
> >  - various features from h264dec, such as handling weird invalid streams or
> >    exporting metadata from SEIs
> > 
> 
> 
> 
> > One question to be resolved is retrieving the frames. The way the API works 
> > is
> > that the decoder maintains and internal pool of frames, to which the caller
> > refers by their indices. When you want the data, you map the frame, which 
> > allows
> > you to copy its contents to a normal CUDA frame. To get optimal performance,
> > this map+copy needs to be delayed wrt decoding by a few frames, so the 
> > question
> > is how this should be done. The options I see are:
> >  - introduce a new pixel format, AV_PIX_FMT_CUVID, which wraps the frame 
> > index
> >    and allows transfer to CUDA via av_hwframe_transfer_data(). Then either
> >    * Return those PIX_FMT_CUVID frames to the caller and let them do the 
> > copy
> >      manually. This is most flexible, but more work for the caller and might
> >      mean synchronization problems, so we'd need to add locks (perhaps to 
> > the
> >      CUVID frames context).
> >    * Handle delay+map+copy somewhere else in lavc. The question is where
> >      would the right place be. Janne suggested at FOSDEM to add a dummy 
> > decoder,
> >      h264_cuvid wrapping h264dec, which would do the delay and copy. That 
> > should
> >      work, but isn't very elegant.
> >  - we could also add some sort of a "postprocess" stage to AVHWaccel, run 
> > before
> >    returning a frame from decode(), or perhaps invoked separately by the 
> > lavc
> >    generic code.
> > This issue might be relevant to other future hwaccels as well (VT?), so 
> > ideally
> > the solution would be generic. Comments and further suggestions very much
> > welcome.
> 
> What is with all this complexity? Is this about the final read-back if
> you want to decode to system RAM? In this case, let it the API user do,
> like any decent API user already does, and which your first point
> suggests. (This means you need to hack avconv.c.) Not sure why "locks"
> would be needed for this.

No, this is about reading the frame from the internal decoder pool into
user-managed GPU memory.

-- 
Anton Khirnov
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to