Recent HM versions now use assert() to verify the conformance of a
stream with the level specified in the VPS/SPS. We're getting bug
reports e.g. http://f265.org/bugs/ticket/24

The level (and tier) affects the following constraints:
- Maximum picture size and aspect ratio.
- Maximum frame rate.
- Maximum bit rate.
- Maximum DPB size.
- Maximum number of slices.
- Maximum number of tiles (rows and columns).
- Minimum size of the tiles (rows and colums).
- Minimum compression ratio.
- Minimum CTB size (32x32 for level 5 and above).

The first thing to note is that the whole thing is a royal pain in the
ass. It's like the MaxMvsPer2Mb constraint in H.264. Moving on, the
question is what we do about it. We have several constraints on our side.

1) Our regressions must keep working with our legacy HM version. That
version is hardcoded to use level 5.1. Some of our regression tests use
16x16 CTBs, which is illegal with level 5.1.

2) We must be able to encode large videos (4k and beyond) at very low
quality, for which the analysis of 32x32 CTBs is not an option.

3) Letting HM fail to decode our streams is not an option. People rely
on it (including us).


Let's assume for a moment that we genuinely want to support the level
indicator in good faith as the spec people defined it. That means using
the minimum level that still respects all the constraints, so that (in
theory) a decoder doesn't refuse to decode the stream because the level
is higher than what it can safely support.

Now, f265 is a general encoder. We don't control all the use cases that
people can potentially use it for. Constant QP, dynamic bit or frame
rate adaptation prevent us from knowing the effective bitrate and thus
the minimum level we should use. What do we then?

We can put the burden on the user. "Read the spec, figure out the
constraints, figure out the worst case for your current video, then
specify the level on the command line". That's obviously not a practical
solution.

We can guarantee spec conformance. We just use the maximum level defined
by the spec. In the real world, most sane decoders will try to decode
the stream regardless of the level indicator. The key word being "most".

We can make a guess. We define the level using some heuristics so that
the stream is conformant most of the time. The key word being "most".

And that's it. It's a catch-22 situation. There doesn't seem to be a
sane way out of this. It's worth noting that it's not just f265 that is
facing this problem, everyone making a practical encoder is. My
reasoning is that practical decoders (aside of HM) will have no choice
but to decode streams correctly even if the level is incorrect (either
too low because the user messed up or the encoder heuristics failed, or
set way too high because the encoder is playing it safe for conformance).

I'm inclined to always use the maximum level to reduce the number of bug
reports of the type "HM refuses to decode my stream" and "lambda
analyzer/decoder says constraint X busted". That would completely
subvert the intended usage of the level indicator. That's unfortunate,
but we didn't create this mess. The HM people seem happy to break the
decoding of streams that used to decode correctly, while defining a set
of constraints that we can't always respect in practice. It seems the
safest and easiest option out of the three.

I would provide a "level" parameter on the command line that the user
can set explicitly. If let to 0, the encoder will set it to level 6.2.
We fix the analysis to always use 32x32 CTBs (we ignore the 32x32 CB if
not desired), unless the maximum CTB size is explicitly signalled on the
command line. The key points are that we're conformant by default and
that the level is decoupled from the parameters used. If that causes the
stream to become non-conformant, so be it. We can add an option later to
auto-guess the level (off by default).

What do you guys think?

Laurent
--
To unsubscribe visit http://f265.org
or send a mail to [email protected].

Reply via email to