Hi, Ben: Since the current document clearly state the specification of SLA details is out of scope, we authors prefer to make no change to changes unless I hear objection for this. Thanks Ben's clarification on BCP 18 question. It is a very useful discussion.
-Qin -----邮件原件----- 发件人: Benjamin Kaduk [mailto:[email protected]] 发送时间: 2021年12月9日 7:43 收件人: Qin Wu <[email protected]> 抄送: [email protected]; [email protected]; The IESG <[email protected]>; Y. Richard Yang <[email protected]>; [email protected] 主题: Re: [alto] Benjamin Kaduk's Discuss on draft-ietf-alto-performance-metrics-20: (with DISCUSS and COMMENT) Hi Qin, It looks like the only topic that's potentially unresolved is the BCP 18 question. I think internationalization is a topic where we mostly look to the ART ADs for guidance, and I'm reluctant to claim any kind of authority on the "right thing to do". Mostly I wanted to raise the topic for visibility in case anyone else had any thoughts; if no one else replies, I think the authors should do what they feel best (which could include making no change to the draft). Thanks, Ben On Mon, Dec 06, 2021 at 01:25:20PM +0000, Qin Wu wrote: > Hi, Ben: > -----邮件原件----- > 发件人: alto [mailto:[email protected]] 代表 Benjamin Kaduk > 发送时间: 2021年12月4日 6:30 > 收件人: Qin Wu <[email protected]> > 抄送: [email protected]; [email protected]; The > IESG <[email protected]>; Y. Richard Yang <[email protected]>; > [email protected] > 主题: Re: [alto] Benjamin Kaduk's Discuss on > draft-ietf-alto-performance-metrics-20: (with DISCUSS and COMMENT) > > Hi Qin, > > On Thu, Dec 02, 2021 at 09:04:18AM +0000, Qin Wu wrote: > > Thanks Ben for detailed valuable review, see reply and clarification below. > > > > -----邮件原件----- > > >发件人: Benjamin Kaduk via Datatracker [mailto:[email protected]] > > >发送时间: 2021年12月2日 13:05 > > >收件人: The IESG <[email protected]> > > >抄送: [email protected]; > > >[email protected]; [email protected]; [email protected]; [email protected] > > >主题: Benjamin Kaduk's Discuss on > > >draft-ietf-alto-performance-metrics-20: (with DISCUSS and COMMENT) > > > > >Benjamin Kaduk has entered the following ballot position for > > >draft-ietf-alto-performance-metrics-20: Discuss > > > > >When responding, please keep the subject line intact and reply to > > >all email addresses included in the To and CC lines. (Feel free to > > >cut this introductory paragraph, however.) > > > > > > >Please refer to > > >https://www.ietf.org/blog/handling-iesg-ballot-positions/ > > >for more information about how to handle DISCUSS and COMMENT positions. > > > > > > >The document, along with other ballot positions, can be found here: > > >https://datatracker.ietf.org/doc/draft-ietf-alto-performance-metric > > >s/ > > > > > > > > >------------------------------------------------------------------- > > >-- > > >- > > >DISCUSS: > > >------------------------------------------------------------------- > > >-- > > >- > > > > >These should all be trivial to resolve -- just some minor internal > > >inconsistencies that need to be fixed before publication. > > > > >The discussion of percentile statistical operator in §2.2 is internally > > >inconsistent -- if the percentile number must be an integer, then p99.9 is > > >not valid. > > [Qin Wu] Yes, the percentile is a number following the letter 'p', > > but in some case when high precision is needed, this percentile number will > > be further followed by an optional decimal part The decimal part should > > start with the '.' separator. Maybe the separator cause your confusion. See > > definition in section 2.2 for details: > > " > > percentile, with letter 'p' followed by a number: > > gives the percentile specified by the number following the letter > > 'p'. The number MUST be a non-negative JSON integer in the range > > [0, 100] (i.e., greater than or equal to 0 and less than or equal > > to 100), followed by an optional decimal part, if a higher > > precision is needed. The decimal part should start with the '.' > > separator (U+002E), and followed by a sequence of one or more > > ASCII numbers between '0' and '9'. > > " > > Let us know if you think separator should be changed or you live with the > > current form. > > Oops, that's my mistake and you are correct. Sorry about that; I agree that > no change is needed here. > > [Qin Wu] Great, thanks. > > >Also, the listing of "cost-source" values introduced by this document (in > > >§5.1) does not include "nominal", but we do also introduce "nominal". > > [Qin Wu] I agree with this inconsistency issue, should be fixed in the next > > version. Thanks. > > >Similarly, in §3.1.3 we refer to the "-<percentile>" component of a cost > > >metric string, that has been generalized to an arbitrary statistical > > >operator. > > [Qin Wu] No, it is not arbitrary statistics operator, We did add a > > statement to say " > > Since the identifier > > does not include the -<percentile> component, the values will > > represent median values. > > " > > The median value has been defined in the section 2.1 as middle-point > > of the observation, see median definition in section 2.2 " > > median: > > the mid-point (i.e., p50) of the observations. > > " > > Hmm, I am not sure whether my point came through properly or not. Let me try > again. > > In Section 3.1.3, we see the text: > > Comment: Since the "cost-type" does not include the "cost-source" > field, the values are based on "estimation". Since the identifier > does not include the -<percentile> component, the values will > represent median values. > > This is the only place in the document where the string "<percentile>" > appears, and in particular we do not define a "percentile component" > anywhere that I can see. We do, however, define a "statistical operator" > string (component) of a cost metric string, in Section 2.2. In particular, > we do have options for the statistical operator string that are *not* > representable as percentile values, such as stddev and cur. So, I think it > is inaccurate to write "-<percentile>" component here. I propose to instead > say "Since the identifier does not include a statistical operator component, > the values will represent median values." > > [Qin Wu] Thank for clarification, I agree with your proposed change. > > >------------------------------------------------------------------- > > >-- > > >- > > >COMMENT: > > >------------------------------------------------------------------- > > >-- > > >- > > > > >All things considered, this is a pretty well-written document that was > > >easy to read. That helped a lot as I reviewed it, especially so on a week > > >with a pretty full agenda for the IESG telechat. > > > > >Section 2.2 > > > > >Should we say anything about how to handle a situation where a base metric > > >identifier is so long that the statistical operator string cannot be > > >appended while remaining under the 32-character limit? > > [Qin Wu] I think base metric identifier should not be randomly selected, > > full name of base metric is not recommended, probably short name or > > abbreviation should be used if cost metric string is too long. > > But I am not sure we should set rule for this. Maybe the rule "The total > > length of the cost metric string MUST NOT exceed 32 " defined in RFC7285 is > > sufficient? > > As far as formal requirements go, that may be all we need. Assuming that no > one needs a percentile value with more than two digits of precision after the > decimal point, the longest statistical operator component we currently define > is seven characters, e.g., "-median". So if someone happens to define a base > metric identifier that's more than 25 characters, we set ourselves up for a > situation where we can use the base metric but can't use -median, -stddev, or > -stdvar. If it's less than 28 characters we could still use -cur, -min, > -max, etc., which would be a rather strange situation to be in! > > I suspect that the right practical approach, if this situation ever arose, > would be to define a new base metric identifier that's an alias for the > existing one -- just a shorter name but with the same semantics. So we might > end up with some text like: > > % RFC 7258 limits the overall cost metric identifier to 32 characters. > The % cost metric variants with statistical operator suffixes defined > by this % document are also subject to the same overall 32-character > limit, so % certain combinations of (long) base metric identifier and > statistical % operator will not be representable. If such a situation > arises, it could % be addressed by defining a new base metric identifier that > is an "alias" > % of the desired base metric, with identical semantics and just a > shorter % name. > > [Qin Wu] The proposed changes look great, thank for input. > > > min: > > > the minimal value of the observations. > > > max: > > > the maximal value of the observations. > > > [...] > > > > >Should we say anything about what sampling period of observations is in > > >scope for these operators? > > [Qin Wu] I think sampling period of observation is related to Method of > > Measurement or Calculation, based on earlier discussion and agreement in > > the group, we believe this more depends on measurement methodology or > > metric definition, which in some cases not necessary or feasible, we can > > look into metric definition RFC for more details. see clarification in > > section 2 for more details. > > Okay, that's a good way to handle it. > [Qin Wu] Thanks. > > >Section 3.x.4 > > > > >If we're going to be recommending that implementations link to > > >external human-readable resources (e.g., for the SLA details of estimation > > >methodology), does the guidance from BCP 18 in indicating the language of > > >text come into play? > > (This was a separate point than the following paragraph, to be clear. > I don't have a good answer to propose.) > > [Qin Wu] I missed this comment, sorry about that. > I think the specification of SLA details is not scope of this document, but > BCP 18 section 4.1 and section 4.5 will provide some guideline on how to > specify those details. Let me know if you prefer us to add reference to BCP > 18 instead of leaving those beyond scope. > > > >It's also a bit surprising that we specify the new fields in the > > >"parameters" of a metric just in passing in the prose, without a more > > >prominent indication that we're defining a new field. > > [Qin Wu] See CostContext defintion in section2.1, "parameters" is included > > in Costcontext object. > > Ah. I think I forgot that the "parameters" were new in this document; sorry > about that. > [Qin Wu] No problem. > > >Section 3.1.4 > > > > > "nominal": Typically network one-way delay does not have a nominal > > > value. > > > > >Does that mean that they MUST NOT be generated, or that they should > > >be ignored if received, or something else? (Similarly for the > > >other sections where we say the same thing.) > > [Qin Wu] Yes, that is my understanding. We can add a statement to make this > > behavior clear. > > > > > This description can be either free text for possible presentation to > > > the user, or a formal specification; see [IANA-IPPM] for the > > > specification on fields which should be included. [...] > > > > >Is the IANA registry really the best reference for what fields to include? > > > Tpically we would only refer to the registry when we care about the > > >current state of registered values, but the need here seems to effectively > > >be >the column headings of the registry, which could be obtained from the > > >RFC defining the registry. > > [Qin Wu] In this IANA registry, it provide Metric Name, Metric URI, click > > URI details, it provide you more details of measurement methodology. That > > is why [IANA-registry] reference is selected, maybe we can make this more > > clear in the text. > > > > >Section 3.3.3 > > > > > Intended Semantics: To specify spatial and temporal aggregated delay > > > variation (also called delay jitter)) with respect to the minimum > > > delay observed on the stream over the one-way delay from the > > > specified source and destination. The spatial aggregation level is > > > specified in the query context (e.g., PID to PID, or endpoint to > > > endpoint). > > > > >I do appreciate the note about how this is not the normal > > >statistics variation that follows this paragraph, but I also don't > > >think this is a particularly clear or precise specification for how > > >to produce the number that is be reported. It also doesn't seem to > > >fully align with the prior art in the IETF, e.g., RFC 3393. It > > >seems like it would be highly preferrable to pick an existing RFC > > >and refer to its specification for computing a delay variation > > >value. (To be clear, such a reference would then be a normative > > >reference.) > > [Qin Wu] Agree, we are not introducing a new metric, we just expose the > > existing metric defined in RFC3393. Also I agree to move RFC3393 as > > normative reference, will see how to fix this. > > >Section 3.4.3 > > > > > Intended Semantics: To specify the number of hops in the path from > > > the specified source to the specified destination. The hop count is > > > a basic measurement of distance in a network and can be exposed as > > > the number of router hops computed from the routing protocols > > > originating this information. [...] > > > > >It seems like this could get a little messy if there are multiple routing > > >protocols in use (e.g., both normal IP routing and an overlay network, as > > >for service function chaining or other overlay schemes). > > >I don't have any suggestions for disambiguating things, though, and if the > > >usage is consistent within a given ALTO Server it may not have much impact > > >on the clients. > > [Qin Wu] Hop count has been implicitly mentioned in RFC7285, this document > > specify this metric explicitly. > > I am thinking which protocol is used can be indicated in in the link (a > > field named "link") providing an URI to a description of the "estimation" > > method. > > >Section 3.4.4 > > > > > "sla": Typically hop count does not have an SLA value. > > > > >As for "nominal", earlier, is there any guidance to give on not generating > > >it or what to do if it is received? > > > (Also appears later, I suppose.) > > [Qin Wu] Will see how to provide guidance on this, thanks. > > >Section 4.1.4 > > > > > "estimation": The exact estimation method is out of the scope of this > > > document. See [Prophet] for a method to estimate TCP throughput. It > > > is RECOMMENDED that the "parameters" field of an "estimation" TCP > > > throughput metric provides two fields: (1) a congestion-control > > > algorithm name (a field named "congestion-control-alg"); and (2) a > > > link (a field named "link")to a description of the "estimation" > > > method. Note that as TCP congestion control algorithms evolve (e.g., > > > TCP Cubic Congestion Control [I-D.ietf-tcpm-rfc8312bis]), it helps to > > > specify as many details as possible on the congestion control > > > algorithm used. This description can be either free text for > > > possible presentation to the user, or a formal specification. > > > [...] > > > > >Do these specifics go into the "congestion-control-alg" name, or in the > > >linked content? > > [Qin Wu] My understanding is the later, but two fields will be provided by > > one "parameters" field which can be seen as JSON object since "parameters" > > is a plural of "parameter". > > I was hoping it would be the latter :) Maybe add a clause at the end > of the last quoted sence like ", as part of the linked contents"? > > [Qin Wu] Okay, will add clarified text, thanks. > > > >Section 5.3 > > > > > To address the backward-compatibility issue, if a "cost-metric" is > > > "routingcost" and the metric contains a "cost-context" field, then it > > > MUST be "estimation"; if it is not, the client SHOULD reject the > > > information as invalid. > > > > >This seems like a sub-optimal route to backwards compatibility, as it > > >would (apparently) permanently lock the "routingcost" metric to only the > > >"estimation" source with no way to negotiate more flexibility. Unless we > > >>define a new "routingcost2" metric that differs only in the lack of this > > >restriction, of course. > > [Qin Wu] Probably we should have a default value for cost-context, I think > > the default value is estimation since legacy client only support metric > > estimation. > > >Section 5.4.1 > > > > > the ALTO server may provide the client with two pieces of additional > > > information: (1) when the metrics are last computed, and (2) when the > > > metrics will be updated (i.e., the validity period of the exposed > > > metric values). The ALTO server can expose these two pieces of > > > information by using the HTTP response headers Last-Modified and > > > Expires. > > > > >While this seems like it would work okay in the usual case, it seems a bit > > >fragile, in that it may fail in boundary cases, such as when a server is > > >just starting up. I would lean towards recommending use of explicit data > > >items to convey this sort of information (and also the overall measurement > > >interval over which statistics are computed, which may not always go back > > >to "the start of time"). > > [Qin Wu] Okay. > > >Section 5.4.2 > > > > > often be link level. For example, routing protocols often measure > > > and report only per link loss, not end-to-end loss; similarly, > > > routing protocols report link level available bandwidth, not end-to- > > > end available bandwidth. The ALTO server then needs to aggregate > > > these data to provide an abstract and unified view that can be more > > > useful to applications. The server should consider that different > > > metrics may use different aggregation computation. For example, the > > > end-to-end latency of a path is the sum of the latency of the links > > > on the path; the end-to-end available bandwidth of a path is the > > > minimum of the available bandwidth of the links on the path. > > > > >Some caution seems in order relating to aggregation of loss measurements, > > >as loss is not always uncorrolated across links in the path. > > [Qin Wu] Agree, but here we just provide examples. > > That is true ... I am approaching this from the sense that there is pretty > nasty "gotcha" that could trip up an implementor, that is very adjacent to > what we do talk about, so adding a caution would be only a minor change. > E.g. (after the quoted text), "In contrast, aggregating loss values is > complicated by the potential for correlated loss events on different links in > the path." > > [Qin Wu] Agree to add caution, thank for proposed text and will consider it. _______________________________________________ alto mailing list [email protected] https://www.ietf.org/mailman/listinfo/alto
