Hello Steve, Thank you for addressing my comments. I am okay with your suggestions.
Best regards, Ines On Sun, Oct 19, 2025 at 1:14 PM Steve Lhomme <[email protected]> wrote: > Hello Ines, > > Thanks a lot for your review. Replies on your comments can be found inline > below. > > On 13 Oct 2025, at 17:07, Ines Robles via Datatracker <[email protected]> > wrote: > > Document: draft-ietf-cellar-tags > Title: Matroska Media Container Tag Specifications > Reviewer: Ines Robles > Review result: Almost Ready > > I am the assigned Gen-ART reviewer for this draft. The General Area > Review Team (Gen-ART) reviews all IETF documents being processed > by the IESG for the IETF Chair. Please treat these comments just > like any other last call comments. > > For more information, please see the FAQ at > > <https://wiki.ietf.org/en/group/gen/GenArtFAQ>. > > Document: draft-ietf-cellar-tags-19 > Reviewer: Ines Robles > Review Date: 2025-10-13 > IETF LC End Date: 2025-10-13 > IESG Telechat date: Not scheduled for a telechat > > Summary: > > This document defines the Matroska multimedia container tags, namely the > tag > names and their respective semantic meaning. > > I have a few comments and questions below that I would appreciate being > addressed before publication. > > Comments: > > 1- Section 3.2.2, states "Multiple items MUST NOT be stored as a list in a > single TagString. If there is more than one tag value with the same name > to be > stored, then more than one SimpleTag MUST be used." > > However, several tag definitions (for example, INSTRUMENTS in Section 4.4 > and > KEYWORDS in Section 4.6) explicitly describe values as being “separated by > a > comma.” This wording suggests that multiple items may appear within a > single > TagString, which seems to contradict the rule in Section 3.2.2. > > Could you please clarify whether these tags are intended to be exceptions > to > that rule, or if the text should instead indicate that each value must be > stored in a separate SimpleTag? > > > The general idea is that the whole Tag element can be seen as a database. > So if you’re looking for a particular artist you should just look for > ARTIST=someone, even the content includes many artists. This is easier that > looking for a particular string which may or may not have the comma escaped > (and we have to define how to escape it). > > The fact the splitting is mandatory is very new > https://github.com/ietf-wg-cellar/matroska-specification/pull/1021 > And as said in the next paragraph, it’s possible some old tags may have > been combined, except they did not have proper formatting rules. And the > next paragraph also mentions that some values may be combined from an other > source. So although it’s a MUST, it’s a rule that can't be 100% enforced. > Maybe we should turn it into a recommended ? > It used to be RECOMMENDED but was changed after Orie’s suggestion. See > https://mailarchive.ietf.org/arch/msg/cellar/slnYqYQ2sy1w48lA3ENyuvnoYWA/ > Maybe we went too far here. > > INSTRUMENTS and KEYWORDS have always had the separator properly defined so > they should be allowed, although it’s not ideal for parsing if the comma is > supposed to be in the name of an instrument or a keyword (very unlikely). > > I made a proposal to go back to Recommended, even for instruments and > keywords: > https://github.com/ietf-wg-cellar/matroska-specification/pull/1030 > > 2- Section 3.3: In Table 2 (“TargetTypeValue for Video”), the draft lists > MOVIE > / EPISODE / CONCERT and describes them as “the most common grouping level > of > video (e.g., an episode for a TV series).” This correctly indicates that > movie > is intended as a representative example. > > However, in the document, several tag descriptions (e.g., DIRECTOR, ACTOR, > LAW_RATING, etc.) refer specifically to “a movie.” > > For precision and inclusivity, these occurrences should be generalized, > since > the tagging system applies to any audiovisual work; including films, > television > episodes, animated content, image-based sequences, podcasts, concerts, or > other > recorded video content. > > It is therefore suggested to replace movie with a broader term such as > video > work, video content, or audiovisual work, as appropriate to the context. > > What do you think? > > > The term “MOVIE” is value that has been defined as such for a very long > time. > It is defined in RFC9559 section 5.1.8.1.1.2 ( > https://www.rfc-editor.org/rfc/rfc9559#section-5.1.8.1.1.2). > So changing it is not an option now. It is possible to add new values via > a document update. However the TargetType element is just marked as > informational. It’s mostly there to give more sense to the TargetTypeValue > element which is mandatory. > > Although the list of values is restricted in RFC9559 there is no IANA > registry defined to add more values. > > 3- Section 3.3, states: “Tags from a TargetTypeValue apply to the all lower > TargetTypeValues.” > > It is not always clear whether “lower” refers to numerically smaller > values or > to semantically subordinate entities. It is implicit that smaller numbers > indicate lower levels in the hierarchy; however, the current wording could > confuse newcomers. > > What about to add a clarification such as: > > “A tag defined for a given TargetTypeValue applies to all Targets with > numerically smaller TargetTypeValues in the same hierarchy, that is, from > higher-level groups to lower-level entities.” > > What do you think? > > > Yes, it’s clearer. I proposed your changes here (with a slight > modification): > https://github.com/ietf-wg-cellar/matroska-specification/pull/1031 > > 4- Section 3.3 defines TargetTypeValue and provides two tables: Table 1 for > audio and Table 2 for video. Both tables list the same numeric values > (e.g., > 50, 40, 30, etc.) but associate them with different semantic examples. For > instance, Table 1 maps 50 to Album, while Table 2 maps 50 to Movie / > Episode / > Concert. > > It would be helpful to clarify whether these tables represent one shared > TargetTypeValue numbering system that applies to all media types (where the > numbers define structural hierarchy levels, and the examples simply > illustrate > common use cases for each media type), or two independent numbering systems > (one for audio and one for video) that happen to reuse the same numeric > values > for different purposes. > > For example, how should this be interpreted in a Matroska file that > contains > both audio and video streams, such as a concert film? > > > As said above, the TargetType is an informational string for human > readability. If you strip those, the TargetTypeValue remains (and is > mandatory). > In that case you can’t tell whether the described content is a concert or > a movie or an album. If a TagTrackUUID ( > https://www.rfc-editor.org/rfc/rfc9559#name-tagtrackuid-element) > is used then you can tell if this is audio or video content you’re > describing but that’s the only case. > > So from your question, the tables do represent one shared TargetTypeValue > numbering system that applies to all media types. > > 5- Section 3.3.1: The current description of PART_OFFSET (“... which is the > number of tracks on the first CD”) correctly implies that it represents a > cumulative or absolute offset, i.e., the number of lower-level items that > precede the current group in the overall collection. To avoid potential > misinterpretation as a relative (per-disc) offset, it might be clearer to > rephrase to something like: > > “PART_OFFSET, at TargetTypeValue 30 (TRACK), represents the number of > lower-level items that precede the current group in the overall > collection. For > example, if CD 1 contains 5 tracks, then the first track of CD 2 has > PART_OFFSET = 5.” > > What do you think? > > > I don’t really understand how it can be misinterpreted. Also this > new/different example is less clear IMO because it defines the PART_OFFSET > for the first track of the second CD. But the PART_OFFSET is always the > same value for all tracks of the second CD. That’s why in the current > example the third track of the second CD is described, to really show that > the PART_OFFSET only depends on the number of tracks on the first CD. > > 6- Section 4.10: It appears to be an inconsistent treatment of numeric tags > with respect to their encoding type. > > For example: The EBU_R128_* tags (e.g., EBU_R128_LOUDNESS) are defined as > binary and store floating-point values in <TagBinary>. The REPLAYGAIN_* > tags > (e.g., REPLAYGAIN_GAIN, REPLAYGAIN_PEAK) represent similar floating-point > values but are defined as UTF-8 strings in <TagString>. This means that two > groups of tags describing essentially the same kind of data (gain/loudness > values in dB or LUFS) are stored using different data types. > > > This document defines already existing tags. The REPLAYGAIN ones have been > there for a long time > and have been defined as strings which can sometimes have a unit. > The EBU_R128 tags are new and we realized parsing what is just a floating > number (which is strict unit) would be easier and safer handled a floating > point in a binary blob. > > So it’s just for historical reasons. The new one being better and easier > to handle. > > 6.1- Could you please clarify whether this distinction is intentional (for > example, due to backward compatibility) or whether a consistent approach is > intended? > > > Yes, totally intentional. > > 6.2- It might be helpful to include a short explanatory note in Section > 4.10 > such as "..ReplayGain tags retain textual representation for compatibility > with > legacy implementations, whereas EBU R128 tags use binary floats for higher > precision..."? > > > OK. I proposed some explanation and recommendation in > https://github.com/ietf-wg-cellar/matroska-specification/pull/1032 > > 6.3- Additionally, it may be useful to provide brief guidance for future > tag > definitions on when to prefer binary versus textual representation for > numeric > values. For example, recommending binary floats for precision-critical > engineering data, and UTF-8 strings for human-readable or legacy-compatible > values. This would help ensure consistent design choices in future > extensions. > > > We may not want to go too deep in explanation on why binary is better. We > can’t be sure > this is always the case. I think saying the binary format is “stricter” > and “RECOMMENDED” should do it. > > 7- Section 5, states: "Most of the time strings are kept as-is and don't > pose a > security issue, apart from invalid UTF-8 values." > > While the mention of “invalid UTF-8 values” is helpful, this phrasing might > still understate the potential risk. Implementations that handle TagStrings > without proper UTF-8 validation or size checks could encounter parsing > errors, > crashes, or buffer overruns if presented with malformed or excessively > large > input data. It may be useful to add a clarifying sentence such as: > > "Implementations MUST validate TagString inputs for UTF-8 correctness and > reasonable length before use, in accordance with the security > considerations in > [RFC 3629]" > > What do you think? > > > Section 5 also mentions it inherits Security Considerations from RFC 8794 > and RFC 9559. > UTF-8 is somehow covered by EBML (RFC 8794) > https://www.rfc-editor.org/rfc/rfc8794.html#name-security-considerations > > It does mention out of bounds and buffer overflow issues. But additional > concern and sources could be used. > I proposed your addition in > https://github.com/ietf-wg-cellar/matroska-specification/pull/1033 > > 8- The draft describes how multiple SimpleTag elements may appear under the > same Tag element, allowing multiple values for the same tag name. > > However, how should applications interpret or prioritize these values if > conflicting tags occur. For example, two TITLE tags with different > TagString > values within the same Targets element? > > > It’s implied by the 3.2.2 section on splitting. Multiple values are split > with the same tag name (eg TITLE) which a single value in each. > Maybe it can be more explicitly said that in that case the values “add up” > to the list of values. In other words they don’t conflict. They all apply. > > Nits: > > 9- choregrapher → choreographer > > > OK > > 10- the values is stored → the value is stored > > > OK > > 11- parts that are inside or outside a given file → ambiguous. Consider > clarifying to something like: “parts located either within or externally > referenced by a given file” ? > > > I used the within word and updated the following example to clarify what > we mean by “outside”: > > The `TargetTypeValue` element allows tagging of different parts that are > within or outside a > given file. For example, in an audio file with one song you could have > information about > the album it comes from the CD set even if the whole CD set is not found > in the file. > > 12- Due to the various nature of tag sources → Due to the varied nature of > tag > sources > > > OK > > 13- each demand needs to balance if it makes sense… → each request needs > to be > evaluated to determine if it makes sense… > > > OK > > 14- an host app → a host app > > > OK > > 15- A Tag element has a single Targets element with a single > TargetTypeValue > element. But the Targets element… → replace “But..” with “However,...” > > > OK > > 16- It is RECOMMENDED to start a tag name… → It is RECOMMENDED that tag > names > start… > > > OK > > 17- for non official tags than are not meant to make it to the list… → for > non-official tags that are not meant to be added to the list of official > tags... > > > OK > > 18- apply to the all lower TargetTypeValues → “…apply to all lower > TargetTypeValues..” > > > OK. > > All nit changes have been proposed in > https://github.com/ietf-wg-cellar/matroska-specification/pull/1034 > > Thanks for this document, > > > Thanks for the review! > Steve > > Ines. > > > > >
_______________________________________________ Gen-art mailing list -- [email protected] To unsubscribe send an email to [email protected]
