Hi Qin,
On Thu, Dec 02, 2021 at 09:04:18AM +0000, Qin Wu wrote:
> Thanks Ben for detailed valuable review, see reply and clarification below.
>
> -----邮件原件-----
> >发件人: Benjamin Kaduk via Datatracker [mailto:[email protected]]
> >发送时间: 2021年12月2日 13:05
> >收件人: The IESG <[email protected]>
> >抄送: [email protected]; [email protected];
> >[email protected]; [email protected]; [email protected]
> >主题: Benjamin Kaduk's Discuss on draft-ietf-alto-performance-metrics-20:
> >(with DISCUSS and COMMENT)
>
> >Benjamin Kaduk has entered the following ballot position for
> >draft-ietf-alto-performance-metrics-20: Discuss
>
> >When responding, please keep the subject line intact and reply to all email
> >addresses included in the To and CC lines. (Feel free to cut this
> >introductory paragraph, however.)
>
>
> >Please refer to https://www.ietf.org/blog/handling-iesg-ballot-positions/
> >for more information about how to handle DISCUSS and COMMENT positions.
>
>
> >The document, along with other ballot positions, can be found here:
> >https://datatracker.ietf.org/doc/draft-ietf-alto-performance-metrics/
>
>
>
> >----------------------------------------------------------------------
> >DISCUSS:
> >----------------------------------------------------------------------
>
> >These should all be trivial to resolve -- just some minor internal
> >inconsistencies that need to be fixed before publication.
>
> >The discussion of percentile statistical operator in §2.2 is internally
> >inconsistent -- if the percentile number must be an integer, then p99.9 is
> >not valid.
> [Qin Wu] Yes, the percentile is a number following the letter 'p', but in
> some case when high precision is needed, this percentile number will be
> further followed by an optional decimal part
> The decimal part should start with the '.' separator. Maybe the separator
> cause your confusion. See definition in section 2.2 for details:
> "
> percentile, with letter 'p' followed by a number:
> gives the percentile specified by the number following the letter
> 'p'. The number MUST be a non-negative JSON integer in the range
> [0, 100] (i.e., greater than or equal to 0 and less than or equal
> to 100), followed by an optional decimal part, if a higher
> precision is needed. The decimal part should start with the '.'
> separator (U+002E), and followed by a sequence of one or more
> ASCII numbers between '0' and '9'.
> "
> Let us know if you think separator should be changed or you live with the
> current form.
Oops, that's my mistake and you are correct. Sorry about that; I agree
that no change is needed here.
> >Also, the listing of "cost-source" values introduced by this document (in
> >§5.1) does not include "nominal", but we do also introduce "nominal".
> [Qin Wu] I agree with this inconsistency issue, should be fixed in the next
> version. Thanks.
> >Similarly, in §3.1.3 we refer to the "-<percentile>" component of a cost
> >metric string, that has been generalized to an arbitrary statistical
> >operator.
> [Qin Wu] No, it is not arbitrary statistics operator, We did add a statement
> to say
> "
> Since the identifier
> does not include the -<percentile> component, the values will
> represent median values.
> "
> The median value has been defined in the section 2.1 as middle-point of the
> observation, see median definition in section 2.2
> "
> median:
> the mid-point (i.e., p50) of the observations.
> "
Hmm, I am not sure whether my point came through properly or not. Let me
try again.
In Section 3.1.3, we see the text:
Comment: Since the "cost-type" does not include the "cost-source"
field, the values are based on "estimation". Since the identifier
does not include the -<percentile> component, the values will
represent median values.
This is the only place in the document where the string "<percentile>"
appears, and in particular we do not define a "percentile component"
anywhere that I can see. We do, however, define a "statistical operator"
string (component) of a cost metric string, in Section 2.2. In particular,
we do have options for the statistical operator string that are *not*
representable as percentile values, such as stddev and cur. So, I think it
is inaccurate to write "-<percentile>" component here. I propose to
instead say "Since the identifier does not include a statistical operator
component, the values will represent median values."
> >----------------------------------------------------------------------
> >COMMENT:
> >----------------------------------------------------------------------
>
> >All things considered, this is a pretty well-written document that was easy
> >to read. That helped a lot as I reviewed it, especially so on a week with a
> >pretty full agenda for the IESG telechat.
>
> >Section 2.2
>
> >Should we say anything about how to handle a situation where a base metric
> >identifier is so long that the statistical operator string cannot be
> >appended while remaining under the 32-character limit?
> [Qin Wu] I think base metric identifier should not be randomly selected, full
> name of base metric is not recommended, probably short name or abbreviation
> should be used if cost metric string is too long.
> But I am not sure we should set rule for this. Maybe the rule "The total
> length of the cost metric string MUST NOT exceed 32 " defined in RFC7285 is
> sufficient?
As far as formal requirements go, that may be all we need. Assuming that
no one needs a percentile value with more than two digits of precision
after the decimal point, the longest statistical operator component we
currently define is seven characters, e.g., "-median". So if someone
happens to define a base metric identifier that's more than 25 characters,
we set ourselves up for a situation where we can use the base metric but
can't use -median, -stddev, or -stdvar. If it's less than 28 characters we
could still use -cur, -min, -max, etc., which would be a rather strange
situation to be in!
I suspect that the right practical approach, if this situation ever arose,
would be to define a new base metric identifier that's an alias for the
existing one -- just a shorter name but with the same semantics. So we
might end up with some text like:
% RFC 7258 limits the overall cost metric identifier to 32 characters. The
% cost metric variants with statistical operator suffixes defined by this
% document are also subject to the same overall 32-character limit, so
% certain combinations of (long) base metric identifier and statistical
% operator will not be representable. If such a situation arises, it could
% be addressed by defining a new base metric identifier that is an "alias"
% of the desired base metric, with identical semantics and just a shorter
% name.
> > min:
> > the minimal value of the observations.
> > max:
> > the maximal value of the observations.
> > [...]
>
> >Should we say anything about what sampling period of observations is in
> >scope for these operators?
> [Qin Wu] I think sampling period of observation is related to Method of
> Measurement or Calculation, based on earlier discussion and agreement in the
> group, we believe this more depends on measurement methodology or metric
> definition, which in some cases not necessary or feasible, we can look into
> metric definition RFC for more details. see clarification in section 2 for
> more details.
Okay, that's a good way to handle it.
> >Section 3.x.4
>
> >If we're going to be recommending that implementations link to external
> >human-readable resources (e.g., for the SLA details of estimation
> >methodology), does the guidance from BCP 18 in indicating the language
> >of text come into play?
(This was a separate point than the following paragraph, to be clear. I
don't have a good answer to propose.)
> >It's also a bit surprising that we specify the new fields in the
> >"parameters" of a metric just in passing in the prose, without a more
> >prominent indication that we're defining a new field.
> [Qin Wu] See CostContext defintion in section2.1, "parameters" is included in
> Costcontext object.
Ah. I think I forgot that the "parameters" were new in this document;
sorry about that.
> >Section 3.1.4
>
> > "nominal": Typically network one-way delay does not have a nominal
> > value.
>
> >Does that mean that they MUST NOT be generated, or that they should be
> >ignored if received, or something else? (Similarly for the other sections
> >where we say the same thing.)
> [Qin Wu] Yes, that is my understanding. We can add a statement to make this
> behavior clear.
>
> > This description can be either free text for possible presentation to
> > the user, or a formal specification; see [IANA-IPPM] for the
> > specification on fields which should be included. [...]
>
> >Is the IANA registry really the best reference for what fields to include?
> >Tpically we would only refer to the registry when we care about the current
> >state of registered values, but the need here seems to effectively be >the
> >column headings of the registry, which could be obtained from the RFC
> >defining the registry.
> [Qin Wu] In this IANA registry, it provide Metric Name, Metric URI, click URI
> details, it provide you more details of measurement methodology. That is why
> [IANA-registry] reference is selected, maybe we can make this more clear in
> the text.
>
> >Section 3.3.3
>
> > Intended Semantics: To specify spatial and temporal aggregated delay
> > variation (also called delay jitter)) with respect to the minimum
> > delay observed on the stream over the one-way delay from the
> > specified source and destination. The spatial aggregation level is
> > specified in the query context (e.g., PID to PID, or endpoint to
> > endpoint).
>
> >I do appreciate the note about how this is not the normal statistics
> >variation that follows this paragraph, but I also don't think this is a
> >particularly clear or precise specification for how to produce the number
> >that is
> >be reported. It also doesn't seem to fully align with the prior art in the
> >IETF, e.g., RFC 3393. It seems like it would be highly preferrable to pick
> >an existing RFC and refer to its specification for computing a
> >delay variation value. (To be clear, such a reference would then be a
> >normative reference.)
> [Qin Wu] Agree, we are not introducing a new metric, we just expose the
> existing metric defined in RFC3393. Also I agree to move RFC3393 as normative
> reference, will see how to fix this.
> >Section 3.4.3
>
> > Intended Semantics: To specify the number of hops in the path from
> > the specified source to the specified destination. The hop count is
> > a basic measurement of distance in a network and can be exposed as
> > the number of router hops computed from the routing protocols
> > originating this information. [...]
>
> >It seems like this could get a little messy if there are multiple routing
> >protocols in use (e.g., both normal IP routing and an overlay network, as
> >for service function chaining or other overlay schemes).
> >I don't have any suggestions for disambiguating things, though, and if the
> >usage is consistent within a given ALTO Server it may not have much impact
> >on the clients.
> [Qin Wu] Hop count has been implicitly mentioned in RFC7285, this document
> specify this metric explicitly.
> I am thinking which protocol is used can be indicated in in the link (a field
> named "link") providing an URI to a description of the "estimation" method.
> >Section 3.4.4
>
> > "sla": Typically hop count does not have an SLA value.
>
> >As for "nominal", earlier, is there any guidance to give on not generating
> >it or what to do if it is received?
> > (Also appears later, I suppose.)
> [Qin Wu] Will see how to provide guidance on this, thanks.
> >Section 4.1.4
>
> > "estimation": The exact estimation method is out of the scope of this
> > document. See [Prophet] for a method to estimate TCP throughput. It
> > is RECOMMENDED that the "parameters" field of an "estimation" TCP
> > throughput metric provides two fields: (1) a congestion-control
> > algorithm name (a field named "congestion-control-alg"); and (2) a
> > link (a field named "link")to a description of the "estimation"
> > method. Note that as TCP congestion control algorithms evolve (e.g.,
> > TCP Cubic Congestion Control [I-D.ietf-tcpm-rfc8312bis]), it helps to
> > specify as many details as possible on the congestion control
> > algorithm used. This description can be either free text for
> > possible presentation to the user, or a formal specification. [...]
>
> >Do these specifics go into the "congestion-control-alg" name, or in the
> >linked content?
> [Qin Wu] My understanding is the later, but two fields will be provided by
> one "parameters" field which can be seen as JSON object since "parameters" is
> a plural of "parameter".
I was hoping it would be the latter :)
Maybe add a clause at the end of the last quoted sence like ", as part of
the linked contents"?
> >Section 5.3
>
> > To address the backward-compatibility issue, if a "cost-metric" is
> > "routingcost" and the metric contains a "cost-context" field, then it
> > MUST be "estimation"; if it is not, the client SHOULD reject the
> > information as invalid.
>
> >This seems like a sub-optimal route to backwards compatibility, as it would
> >(apparently) permanently lock the "routingcost" metric to only the
> >"estimation" source with no way to negotiate more flexibility. Unless we
> >>define a new "routingcost2" metric that differs only in the lack of this
> >restriction, of course.
> [Qin Wu] Probably we should have a default value for cost-context, I think
> the default value is estimation since legacy client only support metric
> estimation.
> >Section 5.4.1
>
> > the ALTO server may provide the client with two pieces of additional
> > information: (1) when the metrics are last computed, and (2) when the
> > metrics will be updated (i.e., the validity period of the exposed
> > metric values). The ALTO server can expose these two pieces of
> > information by using the HTTP response headers Last-Modified and
> > Expires.
>
> >While this seems like it would work okay in the usual case, it seems a bit
> >fragile, in that it may fail in boundary cases, such as when a server is
> >just starting up. I would lean towards recommending use of explicit data
> >items to convey this sort of information (and also the overall measurement
> >interval over which statistics are computed, which may not always go back to
> >"the start of time").
> [Qin Wu] Okay.
> >Section 5.4.2
>
> > often be link level. For example, routing protocols often measure
> > and report only per link loss, not end-to-end loss; similarly,
> > routing protocols report link level available bandwidth, not end-to-
> > end available bandwidth. The ALTO server then needs to aggregate
> > these data to provide an abstract and unified view that can be more
> > useful to applications. The server should consider that different
> > metrics may use different aggregation computation. For example, the
> > end-to-end latency of a path is the sum of the latency of the links
> > on the path; the end-to-end available bandwidth of a path is the
> > minimum of the available bandwidth of the links on the path.
>
> >Some caution seems in order relating to aggregation of loss measurements, as
> >loss is not always uncorrolated across links in the path.
> [Qin Wu] Agree, but here we just provide examples.
That is true ... I am approaching this from the sense that there is pretty
nasty "gotcha" that could trip up an implementor, that is very adjacent to
what we do talk about, so adding a caution would be only a minor change.
E.g. (after the quoted text), "In contrast, aggregating loss values is
complicated by the potential for correlated loss events on different links
in the path."
Thanks for these; all the rest (including bits above that I did not reply
to) looks good.
-Ben
> >Section 6
>
> >I thought that the outcome of the art-art review thread was that we would
> >add some mention of ordinal cost mode here as a means to mitigate the risk
> >of exposing sensitive numerical metric values, but I don't see such
> >test.
>
> >In light of the guidance in Section 7 for new cost source types to document
> >their security considerations, should we document the security
> >considerations for the "sla" type here? The overall theme would be similar
> >to what RFC 7285 already describes, but we could mention that knowledge
> >specifically of provider SLA targets allow for attackers to target the SLA,
> >causing problems for the provider other than the typical DoS attack
> >class. (I'm not coming up with anything new to say about "nominal" or
> >"estimation".)
>
> >I would also consider mentioning that the "origin" references in table 1
> >might have useful things to say about the individual metrics that we use.
>
> >Giving an attacker the ability to receive the instantaneous loss rate on a
> >path could be useful in helping the attacker gauge the efficacy of an
> >ongoing attack targeting that path. The RFCs from the DOTS WG (e.g.,
> 8783 and 9132) may have some useful text on this topic that could be used as
> a model.
>
> [Qin Wu] Good suggestion and will integrate these in the next version. Thanks.
> >Section 9.1
>
> >It's not really clear to me that [IANA-IPPM] needs to be classified as
> >normative (or whatever it is replaced by, in light of my earlier comment in
> >§3.1.4).
>
> >RFC 2330 is cited only once, in a "for example" clause; this would typically
> >cause it to be classified as only an informative reference.
>
> >The mention of RFC 8895 is conditional on it being implemented, so that
> >could probably also be downgraded to an informative reference as well.
> [Qin Wu] Okay, good suggestion.
> >Section 9.2
>
> >Some kind of URL for [Prometheus] would be very helpful.
> > [Prophet], too, though at least that has the ACM/IEEE Transactions venue to
> > anchor the reference.
> [Qin Wu] Okay.
> >I'm not entirely sure why RFC 2818 is classified as normative but RFC
> >8446 only as informative, since they are part of the same (quoted)
> >requirement clause.
> [Qin Wu] Tend to agree, will see how to fix.
> >NITS
>
> >Section 2.1
>
> To make it possible to specify the source and the aforementioned
> parameters, this document introduces an optional "cost-context" field
> to the "cost-type" field defined by the ALTO base protocol
> (Section 10.7 of [RFC7285]) as the following:
>
> I think s/"cost-type" field/CostType object/ would be slightly more accurate.
> [Qin Wu] Agree.
> The "estimation" category indicates that the metric value is computed
> through an estimation process. An ALTO server may compute
> "estimation" values by retrieving and/or aggregating information from
> routing protocols (e.g., [RFC8571]) and traffic measurement
> management tools (e.g., TWAMP [RFC5357]), with corresponding
> operational issues. [...]
>
> >I'm not sure if "with corresponding operational issues" conveys the intended
> >phrasing -- to me, it seems to say "do [the previous things], but expect
> >that there will sometimes be operational issues that make the data
> >unavailable or inaccurate".
> [Qin Wu] Yes, we will see how to rephrase this.
> >Section 2.2
>
> > stddev:
> the standard deviation of the observations.
>
>
> > stdvar:
> the standard variance of the observations.
>
> >Pedantically, we could say if these are sample or population standard
> >deviation/variance (a difference of one in the denominator), but it seems
> >very unlikely to matter for these purposes.
> [Qin Wu] I am thinking maybe this can be indicated in the link (a field named
> "link") providing an URI to a description of the "estimation" method.
> >Section 3
>
> > dropped before reaching the destination (pkt.dropped). The semantics
> > of the performance metrics defined in this section are that they are
> > statistics (percentiles) computed from these measures; for example,
>
> >I suggest "e.g., percentiles" since stddev/variance are not percentiles but
> >are statistics.
> [Qin Wu] Reasonable.
> the x-percentile of the one-way delay is the x-percentile of the set
> of delays {pkt.delay} for the packets in the stream.
>
> >This phrasing presupposes that there is a definite stream under
> >consideration, but I don't think that much confusion is likely and am not
> >sure that there's a need to change anything.
> [Qin Wu] If you have better suggestion, please let us know.
> >Section 3.1.3
>
> >I'd perhaps make a note about the wrapping of the Accept: header field line
> >in the example (and all the other similarly affected examples).
> [Qin Wu] Okay, thanks.
> >Section 3.2.2
>
> >I suggest reusing the phrasing from §3.1.2 that mentions floating-point
> >values, for consistency..
> [Qin Wu] Okay.
> >Section 3.5.2
>
> The metric value type is a single 'JSONNumber' type value conforming
> to the number specification of [RFC8259] Section 6. The number MUST
> be non-negative. The value represents the percentage of packet
> losses.
>
> I'd probably mention floating-point here as well.
> [Qin Wu] Okay.
> Section 4.3.3
>
> Intended Semantics: To specify spatial and temporal maximum
> reservable bandwidth from the specified source to the specified
> destination. The value corresponds to the maximum bandwidth that can
> be reserved (motivated from [RFC3630] Section 2.5.7). The spatial
>
> It's a little interesting to see an OSPF reference for max reservable
> bandwidth when we used an IS-IS one for current residual bandwidth, but it's
> hard to see much harm causing from the mixture of references (who is going to
> follow the references anyway?).
>
> [Qin Wu] Correct, that is our expectation.
>
> Section 7
>
> IANA has created and now maintains the "ALTO Cost Metric Registry",
> listed in Section 14.2, Table 3 of [RFC7285]. This registry is
> located at <https://www.iana.org/assignments/alto-protocol/alto-
> protocol.xhtml#cost-metrics>. This document requests to add the
> following entries to "ALTO Cost Metric Registry".
>
> The live registry has a "reference" column, so I'd add ", with this document
> as the reference", here.
> [Qin Wu] Okay.
> Registered ALTO address type identifiers MUST conform to the
> syntactical requirements specified in Section 2.1. Identifiers are
> to be recorded and displayed as strings.
>
> s/address type/cost source type/
>
> [Qin Wu] Thanks.
>
_______________________________________________
alto mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/alto