Recent HM versions now use assert() to verify the conformance of a stream with the level specified in the VPS/SPS. We're getting bug reports e.g. http://f265.org/bugs/ticket/24
The level (and tier) affects the following constraints: - Maximum picture size and aspect ratio. - Maximum frame rate. - Maximum bit rate. - Maximum DPB size. - Maximum number of slices. - Maximum number of tiles (rows and columns). - Minimum size of the tiles (rows and colums). - Minimum compression ratio. - Minimum CTB size (32x32 for level 5 and above). The first thing to note is that the whole thing is a royal pain in the ass. It's like the MaxMvsPer2Mb constraint in H.264. Moving on, the question is what we do about it. We have several constraints on our side. 1) Our regressions must keep working with our legacy HM version. That version is hardcoded to use level 5.1. Some of our regression tests use 16x16 CTBs, which is illegal with level 5.1. 2) We must be able to encode large videos (4k and beyond) at very low quality, for which the analysis of 32x32 CTBs is not an option. 3) Letting HM fail to decode our streams is not an option. People rely on it (including us). Let's assume for a moment that we genuinely want to support the level indicator in good faith as the spec people defined it. That means using the minimum level that still respects all the constraints, so that (in theory) a decoder doesn't refuse to decode the stream because the level is higher than what it can safely support. Now, f265 is a general encoder. We don't control all the use cases that people can potentially use it for. Constant QP, dynamic bit or frame rate adaptation prevent us from knowing the effective bitrate and thus the minimum level we should use. What do we then? We can put the burden on the user. "Read the spec, figure out the constraints, figure out the worst case for your current video, then specify the level on the command line". That's obviously not a practical solution. We can guarantee spec conformance. We just use the maximum level defined by the spec. In the real world, most sane decoders will try to decode the stream regardless of the level indicator. The key word being "most". We can make a guess. We define the level using some heuristics so that the stream is conformant most of the time. The key word being "most". And that's it. It's a catch-22 situation. There doesn't seem to be a sane way out of this. It's worth noting that it's not just f265 that is facing this problem, everyone making a practical encoder is. My reasoning is that practical decoders (aside of HM) will have no choice but to decode streams correctly even if the level is incorrect (either too low because the user messed up or the encoder heuristics failed, or set way too high because the encoder is playing it safe for conformance). I'm inclined to always use the maximum level to reduce the number of bug reports of the type "HM refuses to decode my stream" and "lambda analyzer/decoder says constraint X busted". That would completely subvert the intended usage of the level indicator. That's unfortunate, but we didn't create this mess. The HM people seem happy to break the decoding of streams that used to decode correctly, while defining a set of constraints that we can't always respect in practice. It seems the safest and easiest option out of the three. I would provide a "level" parameter on the command line that the user can set explicitly. If let to 0, the encoder will set it to level 6.2. We fix the analysis to always use 32x32 CTBs (we ignore the 32x32 CB if not desired), unless the maximum CTB size is explicitly signalled on the command line. The key points are that we're conformant by default and that the level is decoupled from the parameters used. If that causes the stream to become non-conformant, so be it. We can add an option later to auto-guess the level (off by default). What do you guys think? Laurent -- To unsubscribe visit http://f265.org or send a mail to [email protected].
