[alto] Benjamin Kaduk's Discuss on draft-ietf-alto-performance-metrics-21: (with DISCUSS and COMMENT)

Benjamin Kaduk via Datatracker Mon, 20 Dec 2021 16:58:39 -0800

Benjamin Kaduk has entered the following ballot position for
draft-ietf-alto-performance-metrics-21: Discuss


When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)


Please refer to https://www.ietf.org/blog/handling-iesg-ballot-positions/
for more information about how to handle DISCUSS and COMMENT positions.


The document, along with other ballot positions, can be found here:
https://datatracker.ietf.org/doc/draft-ietf-alto-performance-metrics/



----------------------------------------------------------------------
DISCUSS:
----------------------------------------------------------------------

Thank you for addressing my previous discuss points with the -21 (and my
apologies for the spurious one!); I'm glad to see that they were indeed
easy to address.

However, I have looked over the changes from -20 to -21 and seem to have
found a couple more issues that should be addressed:

(1) I can't replicate the Content-Length values in the examples (I only
looked at Examples 1 and 2).  Can you please share the methodology used
to generate the values?  My testing involved copy/paste from the
htmlized version of the draft to a file, manually editing that file to
remove the leading three spaces that come from the formatting of the
draft, and using Unix wc(1) on the resulting file.  It looks like the
numbers reported in the -21 are computed as the overall number of
characters in the file *minus* the number of lines in the file, but I
think it should be the number of characters *plus* the number of lines,
to accommodate the HTTP CRLF line endings.  (My local temporary files
contain standard Unix LF (0x0a) line endings, verified by hexdump(1).)

(2) We seem to be inconsistent about what the "cur" statistical operator
for the "bw-utilized" metric indicates -- in §4.4.3 it is "the current
instantaneous sample", but in §4.4.4 it is somehow repurposed as "The
current ("cur") utilized bandwidth of a path is the maximum of the
available bandwidth of all links on the path."


----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------

I cannot currently provide a concise explanation of the nature of my
unease with the "bw-utilized" metric specification that is new in this
revision (so as to elevate it to a Discuss-level concern), but I
strongly urge the authors and WG to consider my comments on Section 4.4.3.

The new text in Section 1 explaining the origins of the metrics (e.g.,
from TE performance metrics) and why some other TE metrics are not
defined is nicely done.  I trust the responsible AD and WG chairs to
ensure that it, and the other places where we have added new exposition,
has gotten the appropriate level of review from the WG membership.

Section 3.1.2, 3.2.2

I see that the delay-ow and delay-rt semantics have been changed from
milliseconds to microseconds going from -20 to -21.  Either
representation seems fine, but it may be risky to make such a change so
late in the publication process, especially if there are already
implementations in place.  I also don't see any AD ballot comments that
seem to motivate the change, so I'm a bit curious how it arose -- is it
for consistency with the corresponding TE link metrics?

Section 3.3.3

   Intended Semantics: To specify temporal and spatial aggregated delay
   variation (also called delay jitter)) with respect to the minimum
   delay observed on the stream over the one-way delay from the
   specified source and destination, where the one-way delay is defined
   in Section 3.1.  A non-normative reference definition of end-to-end
   one-way delay variation is [RFC3393].  [...]

I note that RFC 3393 explicitly says that as part of the metric, several
parameters must be specified, most notably the selection function F that
unambiguously defines the two packets selected for the metric.  While
it's allowed for F to select as the "first" packet the one with the
smallest one-way delay, which maps up to the "with respect to the
minimum delay observed on the stream" here, it seems to me that it's
fairly important to call out that we are not allowing the full
flexibility of the RFC 3393 metric.  Assuming, of course, that we
specifically have that as the intent, versus allowing the full
generality of RFC 3393.  If there has been some research results since
RFC 3393 was published that indicate that it's preferred to use the
minimum delay for this purpose, that might be worth listing as a
reference in addition to RFC 3393.

Section 3.4.4

The estimation of end-to-end loss rate as the sum of per-link loss rates
is (1) only valid in the low-loss limit, and (2) assumes that each
link's loss events are uncorrelated with every other link's loss events.
The current text does mention (2) in the form of "should be cognizant of
correlated loss rates", but I don't think it touches on (1) at all.
(The general formula for aggregating loss assuming each link is
independent is to compute end-to-end loss as one minus the product of
the success rate for each link.)

Section 4.4.3

It seems like there may some subtlety in the interpretation of the
"bw-utilized" metric, which leads me to wonder if more caution is
advised prior to adding new metrics at this stage in the document
lifecycle.  In particular, it seems like it would be natural to attempt
to compare the "bw-utilized" value against the "bw-maxres" value and
"bw-residual" value, but it seems to me that the inferences that can be
made by such comparisons will depend on the topology in question.
Consider, for example,

Routers and link capacities between them:

       1Gbps            10Gbps            1Gbps
   +-----------------+=================+--------------+
   A                 B                 C              D

If there is a flow using 6GBps from B to C, that would show up when
querying "bw-utilized" between A and B, but that 6Gbps is obviously more
than both the maximum reservable and residual bandwidth end-to-end from
A to D; likewise, the 4GBps of residual bandwidth on the B-to-C link is
also more than the achievable bandwidth end-to-end from A to D.  So it
seems like the utilized bandwidth is potentially from totally unrelated
flows on paths that only have a minimal set of links in common with the
path being queried.  How do we expect someone to use the reported
"bw-utilized" values?

To put it differently, I don't think that the specification of "the
maximum utilized bandwidth among all links from the source to the
destination" will actually provide the desired "utilized bandwidth of
the path from the source to the destination", since the procedure as
stated can report a bandwidth that corresponds to a different path.


NITS

Section 1

s/"Semantics Base On" column/"Semantics Based On" column/ (in the prose,
first paragraph after the table).

Section 4.3

The section heading has a typo: s/Availlble/Available/



_______________________________________________
alto mailing list
alto@ietf.org
https://www.ietf.org/mailman/listinfo/alto

[alto] Benjamin Kaduk's Discuss on draft-ietf-alto-performance-metrics-21: (with DISCUSS and COMMENT)

Reply via email to