Benjamin Kaduk has entered the following ballot position for draft-ietf-alto-performance-metrics-21: Discuss
When responding, please keep the subject line intact and reply to all email addresses included in the To and CC lines. (Feel free to cut this introductory paragraph, however.) Please refer to https://www.ietf.org/blog/handling-iesg-ballot-positions/ for more information about how to handle DISCUSS and COMMENT positions. The document, along with other ballot positions, can be found here: https://datatracker.ietf.org/doc/draft-ietf-alto-performance-metrics/ ---------------------------------------------------------------------- DISCUSS: ---------------------------------------------------------------------- Thank you for addressing my previous discuss points with the -21 (and my apologies for the spurious one!); I'm glad to see that they were indeed easy to address. However, I have looked over the changes from -20 to -21 and seem to have found a couple more issues that should be addressed: (1) I can't replicate the Content-Length values in the examples (I only looked at Examples 1 and 2). Can you please share the methodology used to generate the values? My testing involved copy/paste from the htmlized version of the draft to a file, manually editing that file to remove the leading three spaces that come from the formatting of the draft, and using Unix wc(1) on the resulting file. It looks like the numbers reported in the -21 are computed as the overall number of characters in the file *minus* the number of lines in the file, but I think it should be the number of characters *plus* the number of lines, to accommodate the HTTP CRLF line endings. (My local temporary files contain standard Unix LF (0x0a) line endings, verified by hexdump(1).) (2) We seem to be inconsistent about what the "cur" statistical operator for the "bw-utilized" metric indicates -- in §4.4.3 it is "the current instantaneous sample", but in §4.4.4 it is somehow repurposed as "The current ("cur") utilized bandwidth of a path is the maximum of the available bandwidth of all links on the path." ---------------------------------------------------------------------- COMMENT: ---------------------------------------------------------------------- I cannot currently provide a concise explanation of the nature of my unease with the "bw-utilized" metric specification that is new in this revision (so as to elevate it to a Discuss-level concern), but I strongly urge the authors and WG to consider my comments on Section 4.4.3. The new text in Section 1 explaining the origins of the metrics (e.g., from TE performance metrics) and why some other TE metrics are not defined is nicely done. I trust the responsible AD and WG chairs to ensure that it, and the other places where we have added new exposition, has gotten the appropriate level of review from the WG membership. Section 3.1.2, 3.2.2 I see that the delay-ow and delay-rt semantics have been changed from milliseconds to microseconds going from -20 to -21. Either representation seems fine, but it may be risky to make such a change so late in the publication process, especially if there are already implementations in place. I also don't see any AD ballot comments that seem to motivate the change, so I'm a bit curious how it arose -- is it for consistency with the corresponding TE link metrics? Section 3.3.3 Intended Semantics: To specify temporal and spatial aggregated delay variation (also called delay jitter)) with respect to the minimum delay observed on the stream over the one-way delay from the specified source and destination, where the one-way delay is defined in Section 3.1. A non-normative reference definition of end-to-end one-way delay variation is [RFC3393]. [...] I note that RFC 3393 explicitly says that as part of the metric, several parameters must be specified, most notably the selection function F that unambiguously defines the two packets selected for the metric. While it's allowed for F to select as the "first" packet the one with the smallest one-way delay, which maps up to the "with respect to the minimum delay observed on the stream" here, it seems to me that it's fairly important to call out that we are not allowing the full flexibility of the RFC 3393 metric. Assuming, of course, that we specifically have that as the intent, versus allowing the full generality of RFC 3393. If there has been some research results since RFC 3393 was published that indicate that it's preferred to use the minimum delay for this purpose, that might be worth listing as a reference in addition to RFC 3393. Section 3.4.4 The estimation of end-to-end loss rate as the sum of per-link loss rates is (1) only valid in the low-loss limit, and (2) assumes that each link's loss events are uncorrelated with every other link's loss events. The current text does mention (2) in the form of "should be cognizant of correlated loss rates", but I don't think it touches on (1) at all. (The general formula for aggregating loss assuming each link is independent is to compute end-to-end loss as one minus the product of the success rate for each link.) Section 4.4.3 It seems like there may some subtlety in the interpretation of the "bw-utilized" metric, which leads me to wonder if more caution is advised prior to adding new metrics at this stage in the document lifecycle. In particular, it seems like it would be natural to attempt to compare the "bw-utilized" value against the "bw-maxres" value and "bw-residual" value, but it seems to me that the inferences that can be made by such comparisons will depend on the topology in question. Consider, for example, Routers and link capacities between them: 1Gbps 10Gbps 1Gbps +-----------------+=================+--------------+ A B C D If there is a flow using 6GBps from B to C, that would show up when querying "bw-utilized" between A and B, but that 6Gbps is obviously more than both the maximum reservable and residual bandwidth end-to-end from A to D; likewise, the 4GBps of residual bandwidth on the B-to-C link is also more than the achievable bandwidth end-to-end from A to D. So it seems like the utilized bandwidth is potentially from totally unrelated flows on paths that only have a minimal set of links in common with the path being queried. How do we expect someone to use the reported "bw-utilized" values? To put it differently, I don't think that the specification of "the maximum utilized bandwidth among all links from the source to the destination" will actually provide the desired "utilized bandwidth of the path from the source to the destination", since the procedure as stated can report a bandwidth that corresponds to a different path. NITS Section 1 s/"Semantics Base On" column/"Semantics Based On" column/ (in the prose, first paragraph after the table). Section 4.3 The section heading has a typo: s/Availlble/Available/ _______________________________________________ alto mailing list alto@ietf.org https://www.ietf.org/mailman/listinfo/alto