Re: [RFC] Stateful codecs and requirements for compressed formats

Nicolas Dufresne Wed, 10 Jul 2019 18:41:19 -0700

Le mercredi 10 juillet 2019 à 10:43 +0200, Hans Verkuil a écrit :
> On 6/28/19 8:09 PM, Nicolas Dufresne wrote:
> > Le vendredi 28 juin 2019 à 16:34 +0200, Hans Verkuil a écrit :
> > > Hi all,
> > > 
> > > I hope I Cc-ed everyone with a stake in this issue.
> > > 
> > > One recurring question is how a stateful encoder fills buffers and how a 
> > > stateful
> > > decoder consumes buffers.
> > > 
> > > The most generic case is that an encoder produces a bitstream and just 
> > > fills each
> > > CAPTURE buffer to the brim before continuing with the next buffer.
> > > 
> > > I don't think there are drivers that do this, I believe that all drivers 
> > > just
> > > output a single compressed frame. For interlaced formats I understand it 
> > > is either
> > > one compressed field per buffer, or two compressed fields per buffer 
> > > (this is
> > > what I heard, I don't know if this is true).
> > > 
> > > In any case, I don't think this is specified anywhere. Please correct me 
> > > if I am
> > > wrong.
> > > 
> > > The latest stateful codec spec is here:
> > > 
> > > https://hverkuil.home.xs4all.nl/codec-api/uapi/v4l/dev-mem2mem.html
> > > 
> > > Assuming what I described above is indeed the case, then I think this 
> > > should
> > > be documented. I don't know enough if a flag is needed somewhere to 
> > > describe
> > > the behavior for interlaced formats, or can we leave this open and have 
> > > userspace
> > > detect this?
> > > 
> > > 
> > > For decoders it is more complicated. The stateful decoder spec is written 
> > > with
> > > the assumption that userspace can just fill each OUTPUT buffer to the 
> > > brim with
> > > the compressed bitstream. I.e., no need to split at frame or other 
> > > boundaries.
> > > 
> > > See section 4.5.1.7 in the spec.
> > > 
> > > But I understand that various HW decoders *do* have limitations. I would 
> > > really
> > > like to know about those, since that needs to be exposed to userspace 
> > > somehow.
> > > 
> > > Specifically, the venus decoder needs to know the resolution of the coded 
> > > video
> > > beforehand and it expects a single frame per buffer (how does that work 
> > > for
> > > interlaced formats?).
> > > 
> > > Such requirements mean that some userspace parsing is still required, so 
> > > these
> > > decoders are not completely stateful.
> > > 
> > > Can every codec author give information about their decoder/encoder?
> > > 
> > > I'll start off with my virtual codec driver:
> > > 
> > > vicodec: the decoder fully parses the bitstream. The encoder produces a 
> > > single
> > > compressed frame per buffer. This driver doesn't yet support interlaced 
> > > formats,
> > > but when that is added it will encode one field per buffer.
> > > 
> > > Let's see what the results are.
> > 
> > Hans though a summary of what existing userspace expects / assumes
> > would be nice.
> > 
> > GStreamer:
> > ==========
> > Encodes:
> >   fwht, h263, h264, hevc, jpeg, mpeg4, vp8, vp9
> > Decodes:
> >   fwht, h263, h264, hevc, jpeg, mpeg2, mpeg4, vc1, vp8, vp9
> > 
> > It assumes that each encoded v4l2_buffer contains exactly one frame
> > (any format, two fields for interlaced content). It may still work
> > otherwise, but some issues will appear, timestamp shift, lost of
> > metadata (e.g. timecode, cc, etc.).
> 
> When you say 'each encoded v4l2_buffer contains exactly on frame',
> does that include H.264 SPS/PPS headers? Or are those passed in
> a separate v4l2_buffer?


Yes, the SPS/PPS is assumed to be in the same buffer. In the case of
the decoder it's guarantied to be, if the decoder does not do that, it
will still work with a timestamp shift.

> Ditto for FFMPEG.

I believe it's the same, but I'd need to re-read that code to confirm.
The thing about FFMPEG is that the internal format is always AVC
instead of bytestream. And the PPS/SPS travels out-of-band, which means
it's not inside an AVPacket internally.

> 
> Regards,
> 
>       Hans
> 
> > FFMpeg:
> > =======
> > Encodes:
> >   h263, h264, hevc, mpeg4, vp8
> > Decodes:
> >   h263, h264, hevc, mpeg2, mpeg4, vc1, vp8, vp9
> > 
> > Similarly to GStreamer, it assumes that one AVPacket will fit one
> > v4l2_buffer. On the encoding side, it seems less of a problem, but they
> > don't fully implement the FFMPEG CODEC API for frame matching, which I
> > suspect would create some ambiguity if it was.
> > 
> > Chromium:
> > =========
> > Decodes:
> >   H264, VP8, VP9
> > Encodes:
> >   H264
> > 
> > That is the code I know the less, but the encoder does not seem
> > affected by the nal alignment. The keyframe flag and timestamps seems
> > to be used and are likely expected to correlate with the input, so I
> > suspect that there exist some possible ambiguity if the output is not
> > full frame. For the decoder, I'll have to ask someone else to comment,
> > the code is hard to follow and I could not get to the place where
> > output buffers are filled. I thought the GStreamer code was tough, but
> > this is quite similarly a mess.
> > 
> > Nicolas
> > 
> > 
> > 
> > 
> > 
> >

Re: [RFC] Stateful codecs and requirements for compressed formats

Reply via email to