Re: [RFC] Stateful codecs and requirements for compressed formats

Nicolas Dufresne Wed, 03 Jul 2019 10:43:46 -0700

Le mercredi 03 juillet 2019 à 17:46 +0900, Tomasz Figa a écrit :
> On Sat, Jun 29, 2019 at 3:09 AM Nicolas Dufresne <nico...@ndufresne.ca> wrote:
> > Le vendredi 28 juin 2019 à 16:34 +0200, Hans Verkuil a écrit :
> > > Hi all,
> > > 
> > > I hope I Cc-ed everyone with a stake in this issue.
> > > 
> > > One recurring question is how a stateful encoder fills buffers and how a 
> > > stateful
> > > decoder consumes buffers.
> > > 
> > > The most generic case is that an encoder produces a bitstream and just 
> > > fills each
> > > CAPTURE buffer to the brim before continuing with the next buffer.
> > > 
> > > I don't think there are drivers that do this, I believe that all drivers 
> > > just
> > > output a single compressed frame. For interlaced formats I understand it 
> > > is either
> > > one compressed field per buffer, or two compressed fields per buffer 
> > > (this is
> > > what I heard, I don't know if this is true).
> > > 
> > > In any case, I don't think this is specified anywhere. Please correct me 
> > > if I am
> > > wrong.
> > > 
> > > The latest stateful codec spec is here:
> > > 
> > > https://hverkuil.home.xs4all.nl/codec-api/uapi/v4l/dev-mem2mem.html
> > > 
> > > Assuming what I described above is indeed the case, then I think this 
> > > should
> > > be documented. I don't know enough if a flag is needed somewhere to 
> > > describe
> > > the behavior for interlaced formats, or can we leave this open and have 
> > > userspace
> > > detect this?
> > > 
> > > 
> > > For decoders it is more complicated. The stateful decoder spec is written 
> > > with
> > > the assumption that userspace can just fill each OUTPUT buffer to the 
> > > brim with
> > > the compressed bitstream. I.e., no need to split at frame or other 
> > > boundaries.
> > > 
> > > See section 4.5.1.7 in the spec.
> > > 
> > > But I understand that various HW decoders *do* have limitations. I would 
> > > really
> > > like to know about those, since that needs to be exposed to userspace 
> > > somehow.
> > > 
> > > Specifically, the venus decoder needs to know the resolution of the coded 
> > > video
> > > beforehand and it expects a single frame per buffer (how does that work 
> > > for
> > > interlaced formats?).
> > > 
> > > Such requirements mean that some userspace parsing is still required, so 
> > > these
> > > decoders are not completely stateful.
> > > 
> > > Can every codec author give information about their decoder/encoder?
> > > 
> > > I'll start off with my virtual codec driver:
> > > 
> > > vicodec: the decoder fully parses the bitstream. The encoder produces a 
> > > single
> > > compressed frame per buffer. This driver doesn't yet support interlaced 
> > > formats,
> > > but when that is added it will encode one field per buffer.
> > > 
> > > Let's see what the results are.
> > 
> > Hans though a summary of what existing userspace expects / assumes
> > would be nice.
> > 
> > GStreamer:
> > ==========
> > Encodes:
> >   fwht, h263, h264, hevc, jpeg, mpeg4, vp8, vp9
> > Decodes:
> >   fwht, h263, h264, hevc, jpeg, mpeg2, mpeg4, vc1, vp8, vp9
> > 
> > It assumes that each encoded v4l2_buffer contains exactly one frame
> > (any format, two fields for interlaced content). It may still work
> > otherwise, but some issues will appear, timestamp shift, lost of
> > metadata (e.g. timecode, cc, etc.).
> > 
> > FFMpeg:
> > =======
> > Encodes:
> >   h263, h264, hevc, mpeg4, vp8
> > Decodes:
> >   h263, h264, hevc, mpeg2, mpeg4, vc1, vp8, vp9
> > 
> > Similarly to GStreamer, it assumes that one AVPacket will fit one
> > v4l2_buffer. On the encoding side, it seems less of a problem, but they
> > don't fully implement the FFMPEG CODEC API for frame matching, which I
> > suspect would create some ambiguity if it was.
> > 
> > Chromium:
> > =========
> > Decodes:
> >   H264, VP8, VP9
> > Encodes:
> >   H264
> 
> VP8 too.
> 
> It can in theory handle any format V4L2 could expose, but these 2 seem
> to be the only commonly used codecs used in practice and supported by
> hardware.
> 
> > That is the code I know the less, but the encoder does not seem
> > affected by the nal alignment. The keyframe flag and timestamps seems
> > to be used and are likely expected to correlate with the input, so I
> > suspect that there exist some possible ambiguity if the output is not
> > full frame. For the decoder, I'll have to ask someone else to comment,
> > the code is hard to follow and I could not get to the place where
> > output buffers are filled. I thought the GStreamer code was tough, but
> > this is quite similarly a mess.
> 
> Not sure what's so complicated there. There is a clearly isolated
> function that does the parsing:
> https://cs.chromium.org/chromium/src/media/gpu/v4l2/v4l2_video_decode_accelerator.cc?rcl=2880fe4f6b246809f1be72c5a5698dced4cd85d1&l=984
> 
> It puts special NALUs like SPS and PPS in separate buffers and for
> frames it's 1 frame (all slices of the frame) : 1 buffer.


Consider this a feedback, but the mix of parsing with decoding, along
with the name of the method "::AdvanceFrameFragment".

Thanks for pointing to this code. Was there any HW where this split was
strictly required ?

> 
> Best regards,
> Tomasz

signature.asc
Description: This is a digitally signed message part

Re: [RFC] Stateful codecs and requirements for compressed formats

Reply via email to