Andreas Rheinhardt: > Hello, > > here is the patchset for the Matroska demuxer that I already announced > last week [1]. It is an improvement upon an earlier patchset from March [2] > which has been kindly reviewed by Steve Lhomme. Just as the earlier > attempt, it is intended to get rid of a bogus error message introduced > in 9326117b. > > 1. Here is a recap of why this bogus error message exists: > a) The ebml parser contained in the Matroska muxer basically has two > modes of operation: It can parse simple elements and it can parse whole > master elements all at once. The latter is of course implemented via the > former. Also one can designate certain element IDs as "EBML_STOP" type > so that when such an element is encountered, parsing immediately stops > without parsing the length and performing the checks (regarding whether > an element is truly contained in the master element it is supposed to be > contained in --these checks were introduced in 9326117b). > b) Normal parsing (via matroska_parse_cluster_incremental) uses a syntax > list for parsing that contains elements at different levels in the > hierarchy: The elements that can be found in a cluster and (certain) > level 1 elements (i.e. the level of the cluster); in particular, > it contains a cluster as EBML_STOP type element. When a cluster is > encountered and ebml_parse returns, the current level is ended (if there > already was an open cluster) and the new cluster is entered (via a > designated syntax list containing a cluster as master element) and > parsed until a SimpleBlock or a BlockGroup is encountered (these two are > of course EBML_STOP elements in the used syntax). Then the first syntax > list (the one containing both the cluster as well as the elements > permitted in a cluster as entries) is used again for parsing of said > blocks and all the blocks that follow until one encounters another > cluster or an error occurs. > c) The problem with this approach is that clusters are not closed > automatically when they are finished, but only when another cluster is > encountered. This works (i.e. yields no error message) when a cluster > follows another cluster (because no checks are performed for EBML_STOP > type elements), but if e.g. the cues (the index) is located after a > cluster (as happens with most Matroska files in existence), then parsing > the cues will yield an error: After all, the preceding cluster ended > right before the cues began, but the level hasn't been ended, so the > length check will fail if the preceding cluster wasn't an > unknown-length cluster. (In the latter case, the cues are considered > part of the preceding cluster by the current code.) That's the reason > behind this (harmless) error. > > 2. a) When ebml_parse parses a Master element by itself, it also takes > care of ending the level (i.e. the level automatically entered during > parsing of the master element) itself; but there is no check at the end > of ebml_parse whether a level has just ended right after the last > element; if there were, then the above problem wouldn't exist. > b) So my approach is to add such a check and make ebml_parse end the > levels all by itself; the Matroska functions shouldn't have to end > levels manually (as happens currently when a new cluster is encountered), > but the EBML functions would do so given that the levels are an EBML > concept after all. > c) But all is slightly complicated by the fact that there also can be > unknown-length master elements. These end when an element not allowed in > them but on a higher level is encountered. So in this situation it is > impossible to perform the check whether a level has ended after > consuming an element; instead, said check needs to be performed directly > after the id of the next element is read, without reading any more data > if it turns out that said element belongs to a higher level. And that's > the way it has been implemented. > > 3. I have also added improvements to various checks and other things. > You can find them explained in the various commit messages. Just two things: > a) I am unsure how to check whether a read error or attempted reading > beyond EOF should be checked: Via avio_feof(pb) or via pb->eof_reached? > The former tries to read again (refill the buffer) when pb->eof_reached > was already set; does this mean that if one wants to know whether a read > was not successfull one should check pb->eof_reached, but when one is > more interested into whether one can perform future reads, one should > use avio_feof (after all, in case of livestreaming, new data might have > arrived after the earlier failed read)? That's at least the interpretation > that I had when I wrote the patchset. > b) ffio_limit is used to check for whether there is enough data left to > skip if an element intended to be skipped is encountered. There are > three issues with this function: > i) It takes an int, although it should be an int64_t for our needs. > ii) It emits its own error message (in case it fails) which is not > appropriate for our needs (it claims to be truncating a packet, but in > our case, no packet is truncated; instead it is treated as an indication > that an error happened and a resync is performed). > iii) For some reason it allows the remaining size to be one less than > the size given as argument. > Not understanding iii) is what made me hesitate to factor the needed > part of ffio_limit out of it and using the new function in the Matroska > demuxer. Would be nice if someone could explain the rationale behind > this. This originated (without any explanation) in commit 559ae20d and > got carried over from there since then. > > - Andreas > > [1]: https://ffmpeg.org/pipermail/ffmpeg-devel/2019-May/243769.html > [2]: https://ffmpeg.org/pipermail/ffmpeg-devel/2019-March/241694.html > > > Andreas Rheinhardt (37): > avformat/matroskadec: Remove unused variables > avformat/matroskadec: Correct outdated error message > avformat/matroskadec: Compactify structure > avformat/matroskadec: Don't zero unnecessarily > avformat/matroskadec: Get rid of cluster size field assumption > avformat/matroskadec: Use generic size check for signed integers > avformat/matroskadec: Set offset of first cluster > avformat/matroskadec: Don't copy attached pictures > avformat/matroskadec: Remove redundant initialization > avformat/matroskadec: Properly check return values > avformat/matroskadec: Improve read error/EOF checks I > avformat/matroskadec: Improve read error/EOF checks II > avformat/matroskadec: Improve error/EOF checks III > avformat/matroskadec: Remove non-incremental parsing of clusters > avformat/matroskadec: Don't keep old blocks > avformat/matroskadec: Treat SimpleBlock as EBML_BIN > avformat/matroskadec: Don't abort resyncing upon seek failure > avformat/matroskadec: Add function to reset status > avformat/matroskadec: Use proper levels after discontínuity > avformat/matroskadec: Refactor some functions > avformat/matroskadec: Introduce a "last known good" position > avformat/matroskadec: Link to parents in syntax tables > avformat/matroskadec: Redo level handling > avformat/matroskadec: Make cluster parsing level compatible > avformat/matroskadec: Don't reset cluster position > avformat/matroskadec: Combine arrays > avformat/matroskadec: Redo EOF handling > avformat/matroskadec: Reuse positions > avformat/matroskadec: Typos, nits and cosmetics > avformat/matroskadec: Don't skip too much when unseekable > avformat/matroskadec: Improve invalid length error handling > avformat/matroskadec: Accept more unknown-length elements > avformat/matroskadec: Fix probing of unknown-length headers > avformat/matroskadec: Accept more unknown-length elements II > avformat/matroskadec: Reindent after previous commit > avformat/matroskadec: Use file offsets for level 1 elements > avformat/matroskadec: Improve check for level 1 duplicates > > libavformat/matroskadec.c | 1006 +++++++++++++++++++++---------------- > 1 file changed, 582 insertions(+), 424 deletions(-) > Ping for the whole patchset.
- Andreas _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".