Re: [alto] Benjamin Kaduk's Discuss on draft-ietf-alto-performance-metrics-20: (with DISCUSS and COMMENT)

Qin Wu Thu, 09 Dec 2021 18:46:36 -0800

Hi, Ben:
Since the current document clearly state the specification of SLA details is 
out of scope, we authors prefer to make no change to changes unless I hear 
objection for this. 
Thanks Ben's clarification on BCP 18 question. It is a very useful discussion.


-Qin
-----邮件原件-----
发件人: Benjamin Kaduk [mailto:[email protected]] 
发送时间: 2021年12月9日 7:43
收件人: Qin Wu <[email protected]>
抄送: [email protected]; [email protected]; The IESG 
<[email protected]>; Y. Richard Yang <[email protected]>; [email protected]
主题: Re: [alto] Benjamin Kaduk's Discuss on 
draft-ietf-alto-performance-metrics-20: (with DISCUSS and COMMENT)

Hi Qin,

It looks like the only topic that's potentially unresolved is the BCP 18 
question.  I think internationalization is a topic where we mostly look to the 
ART ADs for guidance, and I'm reluctant to claim any kind of authority on the 
"right thing to do".  Mostly I wanted to raise the topic for visibility in case 
anyone else had any thoughts; if no one else replies, I think the authors 
should do what they feel best (which could include making no change to the 
draft).

Thanks,

Ben

On Mon, Dec 06, 2021 at 01:25:20PM +0000, Qin Wu wrote:
> Hi, Ben:
> -----邮件原件-----
> 发件人: alto [mailto:[email protected]] 代表 Benjamin Kaduk
> 发送时间: 2021年12月4日 6:30
> 收件人: Qin Wu <[email protected]>
> 抄送: [email protected]; [email protected]; The 
> IESG <[email protected]>; Y. Richard Yang <[email protected]>; 
> [email protected]
> 主题: Re: [alto] Benjamin Kaduk's Discuss on 
> draft-ietf-alto-performance-metrics-20: (with DISCUSS and COMMENT)
> 
> Hi Qin,
> 
> On Thu, Dec 02, 2021 at 09:04:18AM +0000, Qin Wu wrote:
> > Thanks Ben for detailed valuable review, see reply and clarification below.
> > 
> > -----邮件原件-----
> > >发件人: Benjamin Kaduk via Datatracker [mailto:[email protected]]
> > >发送时间: 2021年12月2日 13:05
> > >收件人: The IESG <[email protected]>
> > >抄送: [email protected];
> > >[email protected]; [email protected]; [email protected]; [email protected]
> > >主题: Benjamin Kaduk's Discuss on
> > >draft-ietf-alto-performance-metrics-20: (with DISCUSS and COMMENT)
> > 
> > >Benjamin Kaduk has entered the following ballot position for
> > >draft-ietf-alto-performance-metrics-20: Discuss
> > 
> > >When responding, please keep the subject line intact and reply to 
> > >all email addresses included in the To and CC lines. (Feel free to 
> > >cut this introductory paragraph, however.)
> > 
> > 
> > >Please refer to
> > >https://www.ietf.org/blog/handling-iesg-ballot-positions/
> > >for more information about how to handle DISCUSS and COMMENT positions.
> > 
> > 
> > >The document, along with other ballot positions, can be found here:
> > >https://datatracker.ietf.org/doc/draft-ietf-alto-performance-metric
> > >s/
> > 
> > 
> > 
> > >-------------------------------------------------------------------
> > >--
> > >-
> > >DISCUSS:
> > >-------------------------------------------------------------------
> > >--
> > >-
> > 
> > >These should all be trivial to resolve -- just some minor internal 
> > >inconsistencies that need to be fixed before publication.
> > 
> > >The discussion of percentile statistical operator in §2.2 is internally 
> > >inconsistent -- if the percentile number must be an integer, then p99.9 is 
> > >not valid.
> > [Qin Wu] Yes, the percentile is a number following the letter 'p', 
> > but in some case when high precision is needed, this percentile number will 
> > be further followed by an optional decimal part The decimal part should 
> > start with the '.' separator. Maybe the separator cause your confusion. See 
> > definition in section 2.2 for details:
> > "
> >    percentile, with letter 'p' followed by a number:
> >       gives the percentile specified by the number following the letter
> >       'p'.  The number MUST be a non-negative JSON integer in the range
> >       [0, 100] (i.e., greater than or equal to 0 and less than or equal
> >       to 100), followed by an optional decimal part, if a higher
> >       precision is needed.  The decimal part should start with the '.'
> >       separator (U+002E), and followed by a sequence of one or more
> >       ASCII numbers between '0' and '9'.
> > "
> > Let us know if you think separator should be changed or you live with the 
> > current form.
> 
> Oops, that's my mistake and you are correct.  Sorry about that; I agree that 
> no change is needed here.
> 
> [Qin Wu] Great, thanks.
> > >Also, the listing of "cost-source" values introduced by this document (in 
> > >§5.1) does not include "nominal", but we do also introduce "nominal".
> > [Qin Wu] I agree with this inconsistency issue, should be fixed in the next 
> > version. Thanks.
> > >Similarly, in §3.1.3 we refer to the "-<percentile>" component of a cost 
> > >metric string, that has been generalized to an arbitrary statistical 
> > >operator.
> > [Qin Wu] No, it is not arbitrary statistics operator, We did add a 
> > statement to say "
> >    Since the identifier
> >    does not include the -<percentile> component, the values will
> >    represent median values.
> > "
> > The median value has been defined in the section 2.1 as middle-point 
> > of the observation, see median definition in section 2.2 "
> >    median:
> >       the mid-point (i.e., p50) of the observations.
> > "
> 
> Hmm, I am not sure whether my point came through properly or not.  Let me try 
> again.
> 
> In Section 3.1.3, we see the text:
> 
>    Comment: Since the "cost-type" does not include the "cost-source"
>    field, the values are based on "estimation".  Since the identifier
>    does not include the -<percentile> component, the values will
>    represent median values.
> 
> This is the only place in the document where the string "<percentile>"
> appears, and in particular we do not define a "percentile component"
> anywhere that I can see.  We do, however, define a "statistical operator"
> string (component) of a cost metric string, in Section 2.2.  In particular, 
> we do have options for the statistical operator string that are *not* 
> representable as percentile values, such as stddev and cur.  So, I think it 
> is inaccurate to write "-<percentile>" component here.  I propose to instead 
> say "Since the identifier does not include a statistical operator component, 
> the values will represent median values."
> 
> [Qin Wu] Thank for clarification, I agree with your proposed change.
> > >-------------------------------------------------------------------
> > >--
> > >-
> > >COMMENT:
> > >-------------------------------------------------------------------
> > >--
> > >-
> > 
> > >All things considered, this is a pretty well-written document that was 
> > >easy to read.  That helped a lot as I reviewed it, especially so on a week 
> > >with a pretty full agenda for the IESG telechat.
> > 
> > >Section 2.2
> > 
> > >Should we say anything about how to handle a situation where a base metric 
> > >identifier is so long that the statistical operator string cannot be 
> > >appended while remaining under the 32-character limit?
> > [Qin Wu] I think base metric identifier should not be randomly selected, 
> > full name of base metric is not recommended, probably short name or 
> > abbreviation should be used if cost metric string is too long.
> > But I am not sure we should set rule for this. Maybe the rule "The total 
> > length of the cost metric string MUST NOT exceed 32 " defined in RFC7285 is 
> > sufficient? 
> 
> As far as formal requirements go, that may be all we need.  Assuming that no 
> one needs a percentile value with more than two digits of precision after the 
> decimal point, the longest statistical operator component we currently define 
> is seven characters, e.g., "-median".  So if someone happens to define a base 
> metric identifier that's more than 25 characters, we set ourselves up for a 
> situation where we can use the base metric but can't use -median, -stddev, or 
> -stdvar.  If it's less than 28 characters we could still use -cur, -min, 
> -max, etc., which would be a rather strange situation to be in!
> 
> I suspect that the right practical approach, if this situation ever arose, 
> would be to define a new base metric identifier that's an alias for the 
> existing one -- just a shorter name but with the same semantics.  So we might 
> end up with some text like:
> 
> % RFC 7258 limits the overall cost metric identifier to 32 characters.  
> The % cost metric variants with statistical operator suffixes defined 
> by this % document are also subject to the same overall 32-character 
> limit, so % certain combinations of (long) base metric identifier and 
> statistical % operator will not be representable.  If such a situation 
> arises, it could % be addressed by defining a new base metric identifier that 
> is an "alias"
> % of the desired base metric, with identical semantics and just a 
> shorter % name.
> 
> [Qin Wu] The proposed changes look great, thank for input.
> > >   min:
> > >      the minimal value of the observations.
> > >   max:
> > >      the maximal value of the observations.
> > >   [...]
> > 
> > >Should we say anything about what sampling period of observations is in 
> > >scope for these operators?
> > [Qin Wu] I think sampling period of observation is related to Method of 
> > Measurement or Calculation, based on earlier discussion and agreement in 
> > the group, we believe this more depends on measurement methodology or 
> > metric definition, which in some cases not necessary or feasible, we can 
> > look into metric definition RFC for more details. see clarification in 
> > section 2 for more details. 
> 
> Okay, that's a good way to handle it.
> [Qin Wu] Thanks.
> > >Section 3.x.4
> > 
> > >If we're going to be recommending that implementations link to 
> > >external human-readable resources (e.g., for the SLA details of estimation 
> > >methodology), does the guidance from BCP 18 in indicating the language of 
> > >text come into play?
> 
> (This was a separate point than the following paragraph, to be clear.  
> I don't have a good answer to propose.)
> 
> [Qin Wu] I missed this comment, sorry about that.
> I think the specification of SLA details is not scope of this document, but 
> BCP 18 section 4.1 and section 4.5 will provide some guideline on how to 
> specify those details. Let me know if you prefer us to add reference to BCP 
> 18 instead of leaving those beyond scope.
> 
> > >It's also a bit surprising that we specify the new fields in the 
> > >"parameters" of a metric just in passing in the prose, without a more 
> > >prominent indication that we're defining a new field.
> > [Qin Wu] See CostContext defintion in section2.1, "parameters" is included 
> > in Costcontext object.
> 
> Ah.  I think I forgot that the "parameters" were new in this document; sorry 
> about that.
> [Qin Wu] No problem.
> > >Section 3.1.4
> > 
> > >   "nominal": Typically network one-way delay does not have a nominal
> > >   value.
> > 
> > >Does that mean that they MUST NOT be generated, or that they should 
> > >be ignored if received, or something else?  (Similarly for the 
> > >other sections where we say the same thing.)
> > [Qin Wu] Yes, that is my understanding. We can add a statement to make this 
> > behavior clear.
> > 
> > >   This description can be either free text for possible presentation to
> > >   the user, or a formal specification; see [IANA-IPPM] for the
> > >   specification on fields which should be included.  [...]
> > 
> > >Is the IANA registry really the best reference for what fields to include? 
> > > Tpically we would only refer to the registry when we care about the 
> > >current state of registered values, but the need here seems to effectively 
> > >be >the column headings of the registry, which could be obtained from the 
> > >RFC defining the registry.
> > [Qin Wu] In this IANA registry, it provide Metric Name, Metric URI, click 
> > URI details, it provide you more details of measurement methodology. That 
> > is why [IANA-registry] reference is selected, maybe we can make this more 
> > clear in the text.
> > 
> > >Section 3.3.3
> > 
> > >   Intended Semantics: To specify spatial and temporal aggregated delay
> > >   variation (also called delay jitter)) with respect to the minimum
> > >   delay observed on the stream over the one-way delay from the
> > >   specified source and destination.  The spatial aggregation level is
> > >   specified in the query context (e.g., PID to PID, or endpoint to
> > >   endpoint).
> > 
> > >I do appreciate the note about how this is not the normal 
> > >statistics variation that follows this paragraph, but I also don't 
> > >think this is a particularly clear or precise specification for how 
> > >to produce the number that is be reported.  It also doesn't seem to 
> > >fully align with the prior art in the IETF, e.g., RFC 3393.  It 
> > >seems like it would be highly preferrable to pick an existing RFC 
> > >and refer to its specification for computing a delay variation 
> > >value.  (To be clear, such a reference would then be a normative 
> > >reference.)
> > [Qin Wu] Agree, we are not introducing a new metric, we just expose the 
> > existing metric defined in RFC3393. Also I agree to move RFC3393 as 
> > normative reference, will see how to fix this.
> > >Section 3.4.3
> > 
> > >   Intended Semantics: To specify the number of hops in the path from
> > >   the specified source to the specified destination.  The hop count is
> > >   a basic measurement of distance in a network and can be exposed as
> > >   the number of router hops computed from the routing protocols
> > >   originating this information.  [...]
> > 
> > >It seems like this could get a little messy if there are multiple routing 
> > >protocols in use (e.g., both normal IP routing and an overlay network, as 
> > >for service function chaining or other overlay schemes).
> > >I don't have any suggestions for disambiguating things, though, and if the 
> > >usage is consistent within a given ALTO Server it may not have much impact 
> > >on the clients.
> > [Qin Wu] Hop count has been implicitly mentioned in RFC7285, this document 
> > specify this metric explicitly.
> > I am thinking which protocol is used can be indicated in in the link (a 
> > field named "link") providing an URI to a description of the "estimation" 
> > method.
> > >Section 3.4.4
> > 
> > >   "sla": Typically hop count does not have an SLA value.
> > 
> > >As for "nominal", earlier, is there any guidance to give on not generating 
> > >it or what to do if it is received?
> > > (Also appears later, I suppose.)
> > [Qin Wu] Will see how to provide guidance on this, thanks.
> > >Section 4.1.4
> > 
> > >   "estimation": The exact estimation method is out of the scope of this
> > >   document.  See [Prophet] for a method to estimate TCP throughput.  It
> > >   is RECOMMENDED that the "parameters" field of an "estimation" TCP
> > >   throughput metric provides two fields: (1) a congestion-control
> > >   algorithm name (a field named "congestion-control-alg"); and (2) a
> > >   link (a field named "link")to a description of the "estimation"
> > >   method.  Note that as TCP congestion control algorithms evolve (e.g.,
> > >   TCP Cubic Congestion Control [I-D.ietf-tcpm-rfc8312bis]), it helps to
> > >   specify as many details as possible on the congestion control
> > >   algorithm used.  This description can be either free text for
> > >   possible presentation to the user, or a formal specification.  
> > > [...]
> > 
> > >Do these specifics go into the "congestion-control-alg" name, or in the 
> > >linked content?
> > [Qin Wu] My understanding is the later, but two fields will be provided by 
> > one "parameters" field which can be seen as JSON object since "parameters" 
> > is a plural of "parameter".
> 
> I was hoping it would be the latter :) Maybe add a clause at the end 
> of the last quoted sence like ", as part of the linked contents"?
> 
> [Qin Wu] Okay, will add clarified text, thanks.
> 
> > >Section 5.3
> > 
> > >   To address the backward-compatibility issue, if a "cost-metric" is
> > >   "routingcost" and the metric contains a "cost-context" field, then it
> > >   MUST be "estimation"; if it is not, the client SHOULD reject the
> > >   information as invalid.
> > 
> > >This seems like a sub-optimal route to backwards compatibility, as it 
> > >would (apparently) permanently lock the "routingcost" metric to only the 
> > >"estimation" source with no way to negotiate more flexibility.  Unless we 
> > >>define a new "routingcost2" metric that differs only in the lack of this 
> > >restriction, of course.
> > [Qin Wu] Probably we should have a default value for cost-context, I think 
> > the default value is estimation since legacy client only support metric 
> > estimation.
> > >Section 5.4.1
> > 
> > >   the ALTO server may provide the client with two pieces of additional
> > >   information: (1) when the metrics are last computed, and (2) when the
> > >   metrics will be updated (i.e., the validity period of the exposed
> > >   metric values).  The ALTO server can expose these two pieces of
> > >   information by using the HTTP response headers Last-Modified and
> > >   Expires.
> > 
> > >While this seems like it would work okay in the usual case, it seems a bit 
> > >fragile, in that it may fail in boundary cases, such as when a server is 
> > >just starting up.  I would lean towards recommending use of explicit data 
> > >items to convey this sort of information (and also the overall measurement 
> > >interval over which statistics are computed, which may not always go back 
> > >to "the start of time").
> > [Qin Wu] Okay.
> > >Section 5.4.2
> > 
> > >   often be link level.  For example, routing protocols often measure
> > >   and report only per link loss, not end-to-end loss; similarly,
> > >   routing protocols report link level available bandwidth, not end-to-
> > >   end available bandwidth.  The ALTO server then needs to aggregate
> > >   these data to provide an abstract and unified view that can be more
> > >   useful to applications.  The server should consider that different
> > >   metrics may use different aggregation computation.  For example, the
> > >   end-to-end latency of a path is the sum of the latency of the links
> > >   on the path; the end-to-end available bandwidth of a path is the
> > >   minimum of the available bandwidth of the links on the path.
> > 
> > >Some caution seems in order relating to aggregation of loss measurements, 
> > >as loss is not always uncorrolated across links in the path.
> > [Qin Wu] Agree, but here we just provide examples.
> 
> That is true ... I am approaching this from the sense that there is pretty 
> nasty "gotcha" that could trip up an implementor, that is very adjacent to 
> what we do talk about, so adding a caution would be only a minor change.
> E.g. (after the quoted text), "In contrast, aggregating loss values is 
> complicated by the potential for correlated loss events on different links in 
> the path."
> 
> [Qin Wu] Agree to add caution, thank for proposed text and will consider it.
_______________________________________________
alto mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/alto

Re: [alto] Benjamin Kaduk's Discuss on draft-ietf-alto-performance-metrics-20: (with DISCUSS and COMMENT)

Reply via email to