Re: [alto] Benjamin Kaduk's Discuss on draft-ietf-alto-performance-metrics-20: (with DISCUSS and COMMENT)

Benjamin Kaduk Fri, 03 Dec 2021 14:30:30 -0800

Hi Qin,

On Thu, Dec 02, 2021 at 09:04:18AM +0000, Qin Wu wrote:
> Thanks Ben for detailed valuable review, see reply and clarification below.
> 
> -----邮件原件-----
> >发件人: Benjamin Kaduk via Datatracker [mailto:[email protected]] 
> >发送时间: 2021年12月2日 13:05
> >收件人: The IESG <[email protected]>
> >抄送: [email protected]; [email protected]; 
> >[email protected]; [email protected]; [email protected]
> >主题: Benjamin Kaduk's Discuss on draft-ietf-alto-performance-metrics-20: 
> >(with DISCUSS and COMMENT)
> 
> >Benjamin Kaduk has entered the following ballot position for
> >draft-ietf-alto-performance-metrics-20: Discuss
> 
> >When responding, please keep the subject line intact and reply to all email 
> >addresses included in the To and CC lines. (Feel free to cut this 
> >introductory paragraph, however.)
> 
> 
> >Please refer to https://www.ietf.org/blog/handling-iesg-ballot-positions/
> >for more information about how to handle DISCUSS and COMMENT positions.
> 
> 
> >The document, along with other ballot positions, can be found here:
> >https://datatracker.ietf.org/doc/draft-ietf-alto-performance-metrics/
> 
> 
> 
> >----------------------------------------------------------------------
> >DISCUSS:
> >----------------------------------------------------------------------
> 
> >These should all be trivial to resolve -- just some minor internal 
> >inconsistencies that need to be fixed before publication.
> 
> >The discussion of percentile statistical operator in §2.2 is internally 
> >inconsistent -- if the percentile number must be an integer, then p99.9 is 
> >not valid.
> [Qin Wu] Yes, the percentile is a number following the letter 'p', but in 
> some case when high precision is needed, this percentile number will be 
> further followed by an optional decimal part
> The decimal part should start with the '.' separator. Maybe the separator 
> cause your confusion. See definition in section 2.2 for details:
> "
>    percentile, with letter 'p' followed by a number:
>       gives the percentile specified by the number following the letter
>       'p'.  The number MUST be a non-negative JSON integer in the range
>       [0, 100] (i.e., greater than or equal to 0 and less than or equal
>       to 100), followed by an optional decimal part, if a higher
>       precision is needed.  The decimal part should start with the '.'
>       separator (U+002E), and followed by a sequence of one or more
>       ASCII numbers between '0' and '9'.
> "
> Let us know if you think separator should be changed or you live with the 
> current form.


Oops, that's my mistake and you are correct.  Sorry about that; I agree
that no change is needed here.

> >Also, the listing of "cost-source" values introduced by this document (in 
> >§5.1) does not include "nominal", but we do also introduce "nominal".
> [Qin Wu] I agree with this inconsistency issue, should be fixed in the next 
> version. Thanks.
> >Similarly, in §3.1.3 we refer to the "-<percentile>" component of a cost 
> >metric string, that has been generalized to an arbitrary statistical 
> >operator.
> [Qin Wu] No, it is not arbitrary statistics operator, We did add a statement 
> to say
> "
>    Since the identifier
>    does not include the -<percentile> component, the values will
>    represent median values.
> "
> The median value has been defined in the section 2.1 as middle-point of the 
> observation, see median definition in section 2.2
> "
>    median:
>       the mid-point (i.e., p50) of the observations.
> "

Hmm, I am not sure whether my point came through properly or not.  Let me
try again.

In Section 3.1.3, we see the text:

   Comment: Since the "cost-type" does not include the "cost-source"
   field, the values are based on "estimation".  Since the identifier
   does not include the -<percentile> component, the values will
   represent median values.

This is the only place in the document where the string "<percentile>"
appears, and in particular we do not define a "percentile component"
anywhere that I can see.  We do, however, define a "statistical operator"
string (component) of a cost metric string, in Section 2.2.  In particular,
we do have options for the statistical operator string that are *not*
representable as percentile values, such as stddev and cur.  So, I think it
is inaccurate to write "-<percentile>" component here.  I propose to
instead say "Since the identifier does not include a statistical operator
component, the values will represent median values."

> >----------------------------------------------------------------------
> >COMMENT:
> >----------------------------------------------------------------------
> 
> >All things considered, this is a pretty well-written document that was easy 
> >to read.  That helped a lot as I reviewed it, especially so on a week with a 
> >pretty full agenda for the IESG telechat.
> 
> >Section 2.2
> 
> >Should we say anything about how to handle a situation where a base metric 
> >identifier is so long that the statistical operator string cannot be 
> >appended while remaining under the 32-character limit?
> [Qin Wu] I think base metric identifier should not be randomly selected, full 
> name of base metric is not recommended, probably short name or abbreviation 
> should be used if cost metric string is too long.
> But I am not sure we should set rule for this. Maybe the rule "The total 
> length of the cost metric string MUST NOT exceed 32 " defined in RFC7285 is 
> sufficient? 

As far as formal requirements go, that may be all we need.  Assuming that
no one needs a percentile value with more than two digits of precision
after the decimal point, the longest statistical operator component we
currently define is seven characters, e.g., "-median".  So if someone
happens to define a base metric identifier that's more than 25 characters,
we set ourselves up for a situation where we can use the base metric but
can't use -median, -stddev, or -stdvar.  If it's less than 28 characters we
could still use -cur, -min, -max, etc., which would be a rather strange
situation to be in!

I suspect that the right practical approach, if this situation ever arose,
would be to define a new base metric identifier that's an alias for the
existing one -- just a shorter name but with the same semantics.  So we
might end up with some text like:

% RFC 7258 limits the overall cost metric identifier to 32 characters.  The
% cost metric variants with statistical operator suffixes defined by this
% document are also subject to the same overall 32-character limit, so
% certain combinations of (long) base metric identifier and statistical
% operator will not be representable.  If such a situation arises, it could
% be addressed by defining a new base metric identifier that is an "alias"
% of the desired base metric, with identical semantics and just a shorter
% name.

> >   min:
> >      the minimal value of the observations.
> >   max:
> >      the maximal value of the observations.
> >   [...]
> 
> >Should we say anything about what sampling period of observations is in 
> >scope for these operators?
> [Qin Wu] I think sampling period of observation is related to Method of 
> Measurement or Calculation, based on earlier discussion and agreement in the 
> group, we believe this more depends on measurement methodology or metric 
> definition, which in some cases not necessary or feasible, we can look into 
> metric definition RFC for more details. see clarification in section 2 for 
> more details. 

Okay, that's a good way to handle it.

> >Section 3.x.4
> 
> >If we're going to be recommending that implementations link to external 
> >human-readable resources (e.g., for the SLA details of estimation 
> >methodology), does the guidance from BCP 18 in indicating the language
> >of text come into play?

(This was a separate point than the following paragraph, to be clear.  I
don't have a good answer to propose.)

> >It's also a bit surprising that we specify the new fields in the 
> >"parameters" of a metric just in passing in the prose, without a more 
> >prominent indication that we're defining a new field.
> [Qin Wu] See CostContext defintion in section2.1, "parameters" is included in 
> Costcontext object.

Ah.  I think I forgot that the "parameters" were new in this document;
sorry about that.

> >Section 3.1.4
> 
> >   "nominal": Typically network one-way delay does not have a nominal
> >   value.
> 
> >Does that mean that they MUST NOT be generated, or that they should be 
> >ignored if received, or something else?  (Similarly for the other sections 
> >where we say the same thing.)
> [Qin Wu] Yes, that is my understanding. We can add a statement to make this 
> behavior clear.
> 
> >   This description can be either free text for possible presentation to
> >   the user, or a formal specification; see [IANA-IPPM] for the
> >   specification on fields which should be included.  [...]
> 
> >Is the IANA registry really the best reference for what fields to include?  
> >Tpically we would only refer to the registry when we care about the current 
> >state of registered values, but the need here seems to effectively be >the 
> >column headings of the registry, which could be obtained from the RFC 
> >defining the registry.
> [Qin Wu] In this IANA registry, it provide Metric Name, Metric URI, click URI 
> details, it provide you more details of measurement methodology. That is why 
> [IANA-registry] reference is selected, maybe we can make this more clear in 
> the text.
> 
> >Section 3.3.3
> 
> >   Intended Semantics: To specify spatial and temporal aggregated delay
> >   variation (also called delay jitter)) with respect to the minimum
> >   delay observed on the stream over the one-way delay from the
> >   specified source and destination.  The spatial aggregation level is
> >   specified in the query context (e.g., PID to PID, or endpoint to
> >   endpoint).
> 
> >I do appreciate the note about how this is not the normal statistics 
> >variation that follows this paragraph, but I also don't think this is a 
> >particularly clear or precise specification for how to produce the number 
> >that is
> >be reported.  It also doesn't seem to fully align with the prior art in the 
> >IETF, e.g., RFC 3393.  It seems like it would be highly preferrable to pick 
> >an existing RFC and refer to its specification for computing a 
> >delay variation value.  (To be clear, such a reference would then be a 
> >normative reference.)
> [Qin Wu] Agree, we are not introducing a new metric, we just expose the 
> existing metric defined in RFC3393. Also I agree to move RFC3393 as normative 
> reference, will see how to fix this.
> >Section 3.4.3
> 
> >   Intended Semantics: To specify the number of hops in the path from
> >   the specified source to the specified destination.  The hop count is
> >   a basic measurement of distance in a network and can be exposed as
> >   the number of router hops computed from the routing protocols
> >   originating this information.  [...]
> 
> >It seems like this could get a little messy if there are multiple routing 
> >protocols in use (e.g., both normal IP routing and an overlay network, as 
> >for service function chaining or other overlay schemes).
> >I don't have any suggestions for disambiguating things, though, and if the 
> >usage is consistent within a given ALTO Server it may not have much impact 
> >on the clients.
> [Qin Wu] Hop count has been implicitly mentioned in RFC7285, this document 
> specify this metric explicitly.
> I am thinking which protocol is used can be indicated in in the link (a field 
> named "link") providing an URI to a description of the "estimation" method.
> >Section 3.4.4
> 
> >   "sla": Typically hop count does not have an SLA value.
> 
> >As for "nominal", earlier, is there any guidance to give on not generating 
> >it or what to do if it is received?
> > (Also appears later, I suppose.)
> [Qin Wu] Will see how to provide guidance on this, thanks.
> >Section 4.1.4
> 
> >   "estimation": The exact estimation method is out of the scope of this
> >   document.  See [Prophet] for a method to estimate TCP throughput.  It
> >   is RECOMMENDED that the "parameters" field of an "estimation" TCP
> >   throughput metric provides two fields: (1) a congestion-control
> >   algorithm name (a field named "congestion-control-alg"); and (2) a
> >   link (a field named "link")to a description of the "estimation"
> >   method.  Note that as TCP congestion control algorithms evolve (e.g.,
> >   TCP Cubic Congestion Control [I-D.ietf-tcpm-rfc8312bis]), it helps to
> >   specify as many details as possible on the congestion control
> >   algorithm used.  This description can be either free text for
> >   possible presentation to the user, or a formal specification.  [...]
> 
> >Do these specifics go into the "congestion-control-alg" name, or in the 
> >linked content?
> [Qin Wu] My understanding is the later, but two fields will be provided by 
> one "parameters" field which can be seen as JSON object since "parameters" is 
> a plural of "parameter".

I was hoping it would be the latter :)
Maybe add a clause at the end of the last quoted sence like ", as part of
the linked contents"?

> >Section 5.3
> 
> >   To address the backward-compatibility issue, if a "cost-metric" is
> >   "routingcost" and the metric contains a "cost-context" field, then it
> >   MUST be "estimation"; if it is not, the client SHOULD reject the
> >   information as invalid.
> 
> >This seems like a sub-optimal route to backwards compatibility, as it would 
> >(apparently) permanently lock the "routingcost" metric to only the 
> >"estimation" source with no way to negotiate more flexibility.  Unless we 
> >>define a new "routingcost2" metric that differs only in the lack of this 
> >restriction, of course.
> [Qin Wu] Probably we should have a default value for cost-context, I think 
> the default value is estimation since legacy client only support metric 
> estimation.
> >Section 5.4.1
> 
> >   the ALTO server may provide the client with two pieces of additional
> >   information: (1) when the metrics are last computed, and (2) when the
> >   metrics will be updated (i.e., the validity period of the exposed
> >   metric values).  The ALTO server can expose these two pieces of
> >   information by using the HTTP response headers Last-Modified and
> >   Expires.
> 
> >While this seems like it would work okay in the usual case, it seems a bit 
> >fragile, in that it may fail in boundary cases, such as when a server is 
> >just starting up.  I would lean towards recommending use of explicit data 
> >items to convey this sort of information (and also the overall measurement 
> >interval over which statistics are computed, which may not always go back to 
> >"the start of time").
> [Qin Wu] Okay.
> >Section 5.4.2
> 
> >   often be link level.  For example, routing protocols often measure
> >   and report only per link loss, not end-to-end loss; similarly,
> >   routing protocols report link level available bandwidth, not end-to-
> >   end available bandwidth.  The ALTO server then needs to aggregate
> >   these data to provide an abstract and unified view that can be more
> >   useful to applications.  The server should consider that different
> >   metrics may use different aggregation computation.  For example, the
> >   end-to-end latency of a path is the sum of the latency of the links
> >   on the path; the end-to-end available bandwidth of a path is the
> >   minimum of the available bandwidth of the links on the path.
> 
> >Some caution seems in order relating to aggregation of loss measurements, as 
> >loss is not always uncorrolated across links in the path.
> [Qin Wu] Agree, but here we just provide examples.

That is true ... I am approaching this from the sense that there is pretty
nasty "gotcha" that could trip up an implementor, that is very adjacent to
what we do talk about, so adding a caution would be only a minor change.
E.g. (after the quoted text), "In contrast, aggregating loss values is
complicated by the potential for correlated loss events on different links
in the path."

Thanks for these; all the rest (including bits above that I did not reply
to) looks good.

-Ben

> >Section 6
> 
> >I thought that the outcome of the art-art review thread was that we would 
> >add some mention of ordinal cost mode here as a means to mitigate the risk 
> >of exposing sensitive numerical metric values, but I don't see such 
> >test.
> 
> >In light of the guidance in Section 7 for new cost source types to document 
> >their security considerations, should we document the security 
> >considerations for the "sla" type here?  The overall theme would be similar 
> >to what RFC 7285 already describes, but we could mention that knowledge 
> >specifically of provider SLA targets allow for attackers to target the SLA, 
> >causing problems for the provider other than the typical DoS attack
> >class.  (I'm not coming up with anything new to say about "nominal" or 
> >"estimation".)
> 
> >I would also consider mentioning that the "origin" references in table 1 
> >might have useful things to say about the individual metrics that we use.
> 
> >Giving an attacker the ability to receive the instantaneous loss rate on a 
> >path could be useful in helping the attacker gauge the efficacy of an 
> >ongoing attack targeting that path.  The RFCs from the DOTS WG (e.g.,
> 8783 and 9132) may have some useful text on this topic that could be used as 
> a model.
> 
> [Qin Wu] Good suggestion and will integrate these in the next version. Thanks.
> >Section 9.1
> 
> >It's not really clear to me that [IANA-IPPM] needs to be classified as 
> >normative (or whatever it is replaced by, in light of my earlier comment in 
> >§3.1.4).
> 
> >RFC 2330 is cited only once, in a "for example" clause; this would typically 
> >cause it to be classified as only an informative reference.
> 
> >The mention of RFC 8895 is conditional on it being implemented, so that 
> >could probably also be downgraded to an informative reference as well.
> [Qin Wu] Okay, good suggestion.
> >Section 9.2
> 
> >Some kind of URL for [Prometheus] would be very helpful.
> > [Prophet], too, though at least that has the ACM/IEEE Transactions venue to 
> > anchor the reference.
> [Qin Wu] Okay.
> >I'm not entirely sure why RFC 2818 is classified as normative but RFC
> >8446 only as informative, since they are part of the same (quoted) 
> >requirement clause.
> [Qin Wu] Tend to agree, will see how to fix.
> >NITS
> 
> >Section 2.1
> 
>    To make it possible to specify the source and the aforementioned
>    parameters, this document introduces an optional "cost-context" field
>    to the "cost-type" field defined by the ALTO base protocol
>    (Section 10.7 of [RFC7285]) as the following:
> 
> I think s/"cost-type" field/CostType object/ would be slightly more accurate.
> [Qin Wu] Agree.
>    The "estimation" category indicates that the metric value is computed
>    through an estimation process.  An ALTO server may compute
>    "estimation" values by retrieving and/or aggregating information from
>    routing protocols (e.g., [RFC8571]) and traffic measurement
>    management tools (e.g., TWAMP [RFC5357]), with corresponding
>    operational issues.  [...]
> 
> >I'm not sure if "with corresponding operational issues" conveys the intended 
> >phrasing -- to me, it seems to say "do [the previous things], but expect 
> >that there will sometimes be operational issues that make the data 
> >unavailable or inaccurate".
> [Qin Wu] Yes, we will see how to rephrase this.
> >Section 2.2
> 
> >   stddev:
>       the standard deviation of the observations.
> 
> 
> >   stdvar:
>       the standard variance of the observations.
> 
> >Pedantically, we could say if these are sample or population standard 
> >deviation/variance (a difference of one in the denominator), but it seems 
> >very unlikely to matter for these purposes.
> [Qin Wu] I am thinking maybe this can be indicated in the link (a field named 
> "link") providing an URI to a description of the "estimation" method.
> >Section 3
> 
>  >  dropped before reaching the destination (pkt.dropped).  The semantics
>  >  of the performance metrics defined in this section are that they are
>  >  statistics (percentiles) computed from these measures; for example,
> 
> >I suggest "e.g., percentiles" since stddev/variance are not percentiles but 
> >are statistics.
> [Qin Wu] Reasonable.
>    the x-percentile of the one-way delay is the x-percentile of the set
>    of delays {pkt.delay} for the packets in the stream.
> 
> >This phrasing presupposes that there is a definite stream under 
> >consideration, but I don't think that much confusion is likely and am not 
> >sure that there's a need to change anything.
> [Qin Wu] If you have better suggestion, please let us know.
> >Section 3.1.3
> 
> >I'd perhaps make a note about the wrapping of the Accept: header field line 
> >in the example (and all the other similarly affected examples).
> [Qin Wu] Okay, thanks.
> >Section 3.2.2
> 
> >I suggest reusing the phrasing from §3.1.2 that mentions floating-point 
> >values, for consistency..
> [Qin Wu] Okay.
> >Section 3.5.2
> 
>    The metric value type is a single 'JSONNumber' type value conforming
>    to the number specification of [RFC8259] Section 6.  The number MUST
>    be non-negative.  The value represents the percentage of packet
>    losses.
> 
> I'd probably mention floating-point here as well.
> [Qin Wu] Okay.
> Section 4.3.3
> 
>    Intended Semantics: To specify spatial and temporal maximum
>    reservable bandwidth from the specified source to the specified
>    destination.  The value corresponds to the maximum bandwidth that can
>    be reserved (motivated from [RFC3630] Section 2.5.7).  The spatial
> 
> It's a little interesting to see an OSPF reference for max reservable 
> bandwidth when we used an IS-IS one for current residual bandwidth, but it's 
> hard to see much harm causing from the mixture of references (who is going to 
> follow the references anyway?).
> 
> [Qin Wu] Correct, that is our expectation.
> 
> Section 7
> 
>    IANA has created and now maintains the "ALTO Cost Metric Registry",
>    listed in Section 14.2, Table 3 of [RFC7285].  This registry is
>    located at <https://www.iana.org/assignments/alto-protocol/alto-
>    protocol.xhtml#cost-metrics>.  This document requests to add the
>    following entries to "ALTO Cost Metric Registry".
> 
> The live registry has a "reference" column, so I'd add ", with this document 
> as the reference", here.
> [Qin Wu] Okay.
>    Registered ALTO address type identifiers MUST conform to the
>    syntactical requirements specified in Section 2.1.  Identifiers are
>    to be recorded and displayed as strings.
> 
> s/address type/cost source type/
> 
> [Qin Wu] Thanks.
> 

_______________________________________________
alto mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/alto

Re: [alto] Benjamin Kaduk's Discuss on draft-ietf-alto-performance-metrics-20: (with DISCUSS and COMMENT)

Reply via email to