Les,

Thanks for taking your time to answer.

Le 29/07/2021 à 18:26, Les Ginsberg (ginsberg) a écrit :

Guillaume –

Thanx for the thoughtful response.

Responses inline.

*From:* [email protected] <[email protected]>
*Sent:* Thursday, July 29, 2021 3:20 AM
*To:* Les Ginsberg (ginsberg) <[email protected]>; [email protected]; [email protected]
*Cc:* [email protected]
*Subject:* Re: [Lsr] draft-decraene-lsr-isis-flooding-speed & IETF 111

Hello Les,

Jumping in since I have some insight as well.

Le 29/07/2021 à 08:51, Les Ginsberg (ginsberg) a écrit :

    Bruno –

    Resuming this thread…

    I assure you that we have the same goals.

    We are not yet in agreement on the best way to achieve those goals.

Your slides show indeed we have the same goal, and we agree on one way to deal with the matter (congestion control).

    Looking forward to the WG discussion on Friday.

    To get some discussion going in advance – if you have time to do
    so (which I know is challenging especially during IETF week) – I
    call your attention to slides 15-18 in the presentation we have
    prepared:

    
https://datatracker.ietf.org/meeting/111/materials/slides-111-lsr-21-isis-flood-scale-00
    
<https://datatracker.ietf.org/meeting/111/materials/slides-111-lsr-21-isis-flood-scale-00>

    I do not intend to present these slides during my portion of the
    presentation time – but I included them as potential points of
    discussion during the discussion portion of the meeting (though
    the WG chairs will decide how best to direct that portion of the
    time).

    I call your attention specifically to Slide 16, which discusses
    the functional elements in the input path typically seen on router
    platforms.

    Each of these elements has controls associated with it – from
    queue sizes to punt rates, etc. that play a significant role in
    delivery of incoming IS-IS PDUs to the protocol running in the
    control plan.

    Your slides –
    
https://datatracker.ietf.org/meeting/111/materials/slides-111-lsr-22-flow-congestion-control-00
    
<https://datatracker.ietf.org/meeting/111/materials/slides-111-lsr-22-flow-congestion-control-00>


     focus only on the direct input queue to IS-IS in the control
    plane. I do not see how the state of the other staging elements on
    the path from PDU ingress to IS-IS implementation reading the PDUs
    in the control plane is known and/or used in determining the flow
    control state signaled to the transmitting neighbor. If, for
    example, PDUs were being dropped on ingress and never made it to
    IS-IS in the control plane, how would your algorithm be aware of
    this and react to this?

    In my experience, the state of these lower level staging elements
    plays a significant role.

    I can imagine that some form of signaling from the dataplane to
    the control plane about the state of these lower level elements
    could be possible – and that signaling could be used as input to a
    receiver based control algorithm. However, given the differences
    as to how individual platforms implement these lower level
    elements,  I see it as challenging to get each platform to provide
    the necessary signaling in real time and to normalize it in a way
    so that IS-IS flow control logic in the control plane can use the
    data in a platform independent way. I believe this represents a
    significant bar to successful deployment of receiver-based flow
    control.

That's one point we want to clarify, the flow control algorithm does not focus on the "IO path" between the line card and the control plane. There is no magic there, it does not directly deals with congestion on the IO path. It happens it has some nice properties even in case of drops before reaching the control plane, but it's arguably not sufficient. That's why we also propose a congestion control algorithm. While it is not necessary to establish a standard since it's only local, it helps having a baseline if one does not want to spend time re-developing its own algorithm.

Our slides also show the result of our congestion control algorithm, which is the part that deals with IO path losses. Very much like your algorithm, ours sees this IO path as a black box.

*/[LES:] This is an aspect on which I need further clarification./*

*/From the POV of the control plane, in the absence of enhanced signaling (which I believe is problematic to implement – and you seem to agree) you simply have no knowledge as to whether incoming PDUs have been dropped or are simply delayed./*

[GS] From the receiver side, absolutely.

*//*

*/On Slide 6 you say: “Sender will never send more than RWin unacknowledged LSPs”. On Slide 7 you describe how to choose RWIN. But all of these methods are fixed in size – not adaptive. And since the input queue of packets to IS-IS is not “per interface”, the number you choose for RWIN seems to have nothing at all to do with current state/neighbor./*

[GS] If input queue = last buffer before processing (what I refer as socket buffer). Then we indeed assumed we have 1 socket per neighbor. If it is not the case, and the input queue is shared among all neighbors, you can split your available memory between neighbors (and in that case, the advertised value can indeed change, but not very often). The idea is that each sender knows the amount of space it can use on the receiver side without packets being lost.

*//*

*/You then go on to discuss RTT (which seems to be a configured or pre-determined value??) and LPP (LSPs acked/PSNP) – which is only meaningful for the LSPs that have actually made their way to IS-IS – no way to account for those that have been dropped or delayed./*

[GS] RTT is useful for analysis purposes, not for the RWin algorithm itself: if you bound the number of unacked LSPs, you naturally bound your bandwidth.

The dropped LSPs are actually taken into account.

 * Implicitly for RWin : since dropped LSPs are unacked LSPs in the
   sender's POV, they limit the number of sent LSPs afterwards.
 * Explicitly for Congestion Control, since they will arrive late/never
   and a congestion signal will get triggered on the sender side.

*//*

*/So it is difficult for me to understand how you are actually accounting for the current state of the I/O path in real time on a per neighbor basis./*

*//*

    This is one major reason why we prefer a Tx based flow control
    mechanism. Tx based flow control simply focuses on the relative
    rates of LSP transmission and reception of Acknowledgments.
    Whether slower acks are due to PDU drops at ingress, slow punt
    path operation, lack of CPU time for IS-IS to process the incoming
    packets, etc. matters not. Whatever the reason, if acks are not
    keeping pace with transmission we know we have to slow down.

As you can see in the slides & draft, we also have a congestion control algorithm and we show the results. This congestion control algorithm only works with the ACKs (like yours), and gives results in the case of "IO congestion" (like yours).

*/[LES:] As best I understand it, ACKs in your case are from the receiver’s POV – which makes it dependent on what LSPs the receiver has actually seen./*

*/In the TX based algorithm, we don’t care/know whether the receiver has seen anything – we just know whether we have got timely ACKs or not. And since it is possible that drops/delays could occur on the Tx side as well as the Rx side, this approach seems much more robust./*

[GS] I need some clarification here. I am indeed talking about Sender to Receiver LSPs and Receiver to Sender PSNPs.

If you talk about timely ACKs, they also come from the Receiver, right ? So it needs to see the LSPs as well. I don't think our approaches are different on that particular point. The difference is that you control bandwidth, while we limit the number of unacked LSPs.

Flow and congestion control are not mutually exclusive; in fact it is almost certain it will be necessary to have both at some point. The main benefit of limiting the number of unacked packets inflight is to avoid loosing packets in case of CPU contention. As this should be a common situation (in part for reasons in your slide 15), flow control as we propose seems very relevant.

For example, in your Slowing Down scenario, if the slowing down occurs at the Control Plane, a congestion control algorithm will lose packets. If on top of your algorithm, you limit the number of unacked LSPs (flow control), these losses cannot occur anymore as the sender will stop sending before overflowing the socket buffer. It's an (almost) free win.

*/[LES:] In both approaches it is the control plane that is adapting. And Tx based approach does react to the number of unacked packets./*

[GS] Your approach does track but does not limit the number of unacked packets, thus allowing for the losses in the above scenario. If you add a maximum number of unacked LSPs based on the knowledge of the receiver's input queue, you avoid losing packets in the above scenario.

*//*

In addition, slide 17 talks about signalling in real time; I am unsure of your point. As the socket size is static (or at least long lived), there is no need to change the advertised value in real time. Maybe the previous explanations helped in clarifying the proposed changes. I don't really understand the point of slide 18 neither. I would be interested in more details.

*/[LES:] The point of Slide 17 is that if we want the algorithm to react to transient changes in the receiver’s capability (e.g., due to bursts unrelated to IS-IS LSP activity), to be effective we have to do so quickly. And since this coincides with the high input of IS-IS PDUs, the likelihood of delays in processing the PDUs used to send the signal is higher than normal. Look at our Slide #8 Row #2 for an example of the consequences of not adapting quickly. /*

*/If, as you say, there is no need to signal in real time, this tells me that you are simply advertising conservative values not based on the actual real time performance, in which case the issues highlighted on our Slide #15 are relevant. You seem to be acknowledging you have no intent to adapt to transient changes – you are simply going to limit things to something you have determined via offline evaluation should be safe. But such values have to account for the “worst case” in terms of # of neighbors, # of LSPs in the network, …all the things listed on our Slide #15 – so either they are overly conservative, or the customer somehow has to determine what value to configure based on the network. So you aren’t actually proposing anything adaptive at all it seems. ???/*

[GS] For RWin, even though the advertised value (in the proposed new TLV) is static, the algorithm still reacts to ACKs, and will naturally pace the sending to the Receiver ACKs.

Let's take a simple case : a receiver advertise an RWin of 10, and to simplify, sends 1 PSNP as soon as it has processed the LSP. The sender uses only RWin (for now). It has a large amount of LSPs to send. It sends its first 10 LSPs, then stops.

The receiver processes the first LSP, sends a PSNP, then the second LSP etc. The PSNPs are paced to the processing rate of the receiver control plane (since it sends them as soon as it can).

The sender then sees the first PSNP coming, it knows LSP #1 has been acked : it can safely assume that this LSP is not inside the input queue, and send another one. The same goes on for the following LSPs, and since PSNPs were paced by the receiver control plane, the sender automatically adapts to this rate.

If the receiver is busy at some point, it won't send PSNPs anymore, thus halting the sending of LSPs as well. Exactly what is needed for the cases you describe.

This is why this algorithm performs particularly well under CPU contention. This is also why this algorithm is very dependent on sending PSNPs as fast as possible.

So yes, the advertised value itself is not dynamic, but the sending rate is paced dynamically (_the signal is not on our TLV, is comes implicitly in the PSNPs_).

It does not work well when losses occur elsewhere (ie not due to CPU contention). In that case, there is absolutely the need to control the rate, either explicitly (as you do), or implicitly (as we do, by adapting the number of unacked LSPs we allow). But I don't think the Congestion control algorithm itself should be the focus of this discussion.

*//*

*/Slide 18 is highlighting the differences between operation of a TCP session and operation of IS-IS Update process. One of the arguments used for the Rx based approach has been “this is the way TCP has done it for years”. We are just highlighting why that isn’t a very good analogy. To be fair, I note you have acknowledged some (but not all) of this in your presentation as well e.g., on your Slide 25 you acknowledge that “packet reordering” isn’t applicable to IS-IS./*

[GS] It's true that the reasoning is not that simple, but I stand by the result. There are still lots of things to take from TCP, being the /de facto /playground of congestion control algorithms IRL...

*//*

Thanks for your remarks,

*/[LES:] Thank you for your response. I hope we can continue this dialogue./*

[GS] Agreed !

Guillaume

*//*

*/   Les/*

Guillaume

    Please comment on these points as you have time.

    Thanx.

       Les

    *From:* [email protected]
    <mailto:[email protected]> <[email protected]>
    <mailto:[email protected]>
    *Sent:* Monday, July 12, 2021 1:48 AM
    *To:* Les Ginsberg (ginsberg) <[email protected]>
    <mailto:[email protected]>; [email protected]
    <mailto:[email protected]>
    *Cc:* [email protected] <mailto:[email protected]>
    *Subject:* RE: draft-decraene-lsr-isis-flooding-speed & IETF 111

    Les,

    Faster flooding may be achieved without protocol extension. But if
    we are at changing flooding, it would be reasonable to try to make
    it good (rather than just faster than today).

    In particular some goals are:

    - faster flooding when the receiver has free cycles

    - slower flooding when the receiver is busy/congested (either by
    flooding, or any CPU computation including not coming from IS-IS)

    - avoiding/minimizing the parameters that the network operator is
    been asked to tune

    - avoiding/minimizing the loss of LSPs

    - robust to a wide variety of conditions (good ones and bad ones)

    You seem to agree on changing the flooding behaviour on both the
    sender and the receiver so that they can better cooperate. That’s
    protocol extension to me (and IMHO much bigger than the sending of
    info in one TLV)

    Bruno

    *From:*Les Ginsberg (ginsberg) [mailto:[email protected]
    <mailto:[email protected]>]
    *Sent:* Friday, July 9, 2021 7:49 PM
    *To:* DECRAENE Bruno INNOV/NET <[email protected]
    <mailto:[email protected]>>; [email protected]
    <mailto:[email protected]>
    *Cc:* [email protected] <mailto:[email protected]>
    *Subject:* RE: draft-decraene-lsr-isis-flooding-speed & IETF 111

    Bruno –

    Neither of us has presented anything new of substance in the last
    few IETFs.

    There were two presentations recently - one by Arista and one by
    Huawei – each of which simply demonstrated that it is possible to
    flood faster - and that in order to do so it is helpful to send
    acks faster - both points on which there is no disagreement.

    To have a productive discussion we both need to present new data -
    which is why having the discussion as part of the meeting at which
    the presentations occur makes sense to me.

    We removed the example(sic) algorithm from our draft because it
    was only an example, is not normative, and we did not want the
    discussion of our approach to be bogged down in a debate on the
    specifics of the example algorithm.

    Based on your response, seems like we were right to remove the
    algorithm. 😊

    Regarding WG adoption, one of the premises of our draft is that
    faster flooding can be achieved w/o protocol extensions and so
    there is no need for a draft at all. I am sure we do not yet agree
    on this - but I do hope that makes clear why adopting either draft
    at this time is premature.

       Les

    *From:* [email protected]
    <mailto:[email protected]> <[email protected]
    <mailto:[email protected]>>
    *Sent:* Friday, July 9, 2021 9:15 AM
    *To:* Les Ginsberg (ginsberg) <[email protected]
    <mailto:[email protected]>>; [email protected]
    <mailto:[email protected]>
    *Cc:* [email protected] <mailto:[email protected]>
    *Subject:* RE: draft-decraene-lsr-isis-flooding-speed & IETF 111

    Les,

    > *From:*Les Ginsberg (ginsberg) [mailto:[email protected]
    <mailto:[email protected]>]

    […]

    > I also think it would be prudent to delay WG adoption

    For how long exactly would it be “prudent to delay WG adoption”?
    (in addition to the past two years)

    Until what condition?

    It’s been two years now since
    draft-decraene-lsr-isis-flooding-speed brought this subject to the
    WG (and even more in private discussions).

    Two years during which we have presented our work to the WG,
    discussed your comments/objections, been asked to provide more
    data and consequently worked harder to implement it and obtain
    evaluation results.

    What’s precisely the bar before a call for WG adoption be initiated?

    We have data proving the benefit, so after those two years, what
    are your clear and precise _/technical/_ objections to the
    mechanism proposed in draft-decraene-lsr-isis-flooding-speed?

    Coming back to draft-decraene-lsr-isis-flooding-speed,

    we have a specification and the flow control part is stable.

    We have an implementation and many evaluations demonstrating that
    flow control alone is very effective in typical conditions.

    we have an additional congestion control part which is still been
    refined but this part is a local behavior which don’t necessarily
    needs to be standardized and which is mostly useful when the
    receivers of the LSP is not CPU-bound which does not seem to be
    the case from what we have seen. (in most of the cases, receivers
    are CPU bound. In fact, we needed to artificially create I/O
    congestion in order to evaluate the congestion control part) .

    Regarding your draft, in the latest version of your draft,
    published yesterday, you have removed the specification of your
    proposed congestion control algorithm… Based on this, I don’t see
    how technical discussion and comparison of the specification can
    be achieved.

    You have an implementation. This is good to know and we are ready
    to evaluate it under the same conditions than our implementation,
    so that we can compare the data. Could you please send us an
    image? We may be able to have data for the interim.

    --Bruno

    *From:*Les Ginsberg (ginsberg) [mailto:[email protected]
    <mailto:[email protected]>]
    *Sent:* Friday, July 9, 2021 5:00 PM
    *To:* DECRAENE Bruno INNOV/NET <[email protected]
    <mailto:[email protected]>>; [email protected]
    <mailto:[email protected]>; [email protected] <mailto:[email protected]>
    *Subject:* RE: draft-decraene-lsr-isis-flooding-speed & IETF 111

    As is well known, there are two drafts in this problem space:

    https://datatracker.ietf.org/doc/draft-decraene-lsr-isis-flooding-speed/
    <https://datatracker.ietf.org/doc/draft-decraene-lsr-isis-flooding-speed/>

    and

    https://datatracker.ietf.org/doc/draft-ginsberg-lsr-isis-flooding-scale/
    <https://datatracker.ietf.org/doc/draft-ginsberg-lsr-isis-flooding-scale/>

    Regarding the latter, we also have a working implementation and we
    also have requested a presentation slot for IETF 111 LSR WG meeting.

    I agree with Bruno that the time available in the WG meeting will
    likely be inadequate to present full updates for both drafts. In
    addition, I think it is important that the WG have

    an opportunity to discuss publicly in an interactive way, the
    merits of each proposal. The likelihood that time will be
    available in the scheduled WG meeting for that discussion as well
    seems low.

    I therefore join w Bruno in suggesting that an interim meeting
    dedicated to the flooding speed topic be organized.

    Given the short time available before IETF 111, I would suggest
    that we look at scheduling an interim meeting after IETF 111 - but
    I leave it to the WG chairs to decide when to schedule this.

    I also think it would be prudent to delay WG adoption calls for
    either draft until after such an interim meeting is held. In that
    way the WG can make a more informed decision.

       Les

    *From:* Lsr <[email protected] <mailto:[email protected]>>
    *On Behalf Of *[email protected]
    <mailto:[email protected]>
    *Sent:* Friday, July 9, 2021 2:01 AM
    *To:* [email protected] <mailto:[email protected]>;
    [email protected] <mailto:[email protected]>
    *Subject:* [Lsr] draft-decraene-lsr-isis-flooding-speed & IETF 111

    Hi chairs, WG,

    Over the last two years, we have presented and the WG discussed
    draft-decraene-lsr-isis-flooding-speed at IETF 105 and “107”

    IETF 105: https://datatracker.ietf.org/meeting/105/proceedings#lsr
    <https://datatracker.ietf.org/meeting/105/proceedings#lsr>   
    Note: that the presentation is in first slot/video but a large
    part of the discussion is in the second one.

    IETF 107/interim:
    
https://datatracker.ietf.org/meeting/interim-2020-lsr-02/materials/agenda-interim-2020-lsr-02-lsr-01-07.html
    
<https://datatracker.ietf.org/meeting/interim-2020-lsr-02/materials/agenda-interim-2020-lsr-02-lsr-01-07.html>

    The goal is to improve flooding performance and robustness to make
    it both faster when the receiver have free cycles, and slower when
    the receiver is congested.

    In addition to technical discussions, a feedback was that
    implementation and tests/evaluation would be good in order to
    evaluate the proposal.

    We are reporting that we have an implementation of [1] based on
    the open source Free Range Routing implementation.

    We are now ready to report the evaluation to the WG. We have a lot
    of data so ideally would need around an hour in order to cover the
    whole picture.

    We have requested a slot for IETF 111 LSR meeting. If the IETF 111
    slot is short, we’d like to request for an interim meeting. In
    order to keep the context, the sooner/closer to IETF 111 seems the
    better.

    Since we have an implementation, we have requested for a code
    point, in order to avoid squatting on one. This is currently under
    review by the designed experts.

    Finally, given the two-years work, the specification, the
    implementation and extensive evaluation, we’d like to ask for WG
    adoption.

    Thanks,

    Regards,

    --Bruno

    [1]
    https://datatracker.ietf.org/doc/html/draft-decraene-lsr-isis-flooding-speed
    
<https://datatracker.ietf.org/doc/html/draft-decraene-lsr-isis-flooding-speed>

    
_________________________________________________________________________________________________________________________

    Ce message et ses pieces jointes peuvent contenir des informations
    confidentielles ou privilegiees et ne doivent donc

    pas etre diffuses, exploites ou copies sans autorisation. Si vous
    avez recu ce message par erreur, veuillez le signaler

    a l'expediteur et le detruire ainsi que les pieces jointes. Les
    messages electroniques etant susceptibles d'alteration,

    Orange decline toute responsabilite si ce message a ete altere,
    deforme ou falsifie. Merci.

    This message and its attachments may contain confidential or
    privileged information that may be protected by law;

    they should not be distributed, used or copied without authorisation.

    If you have received this email in error, please notify the sender
    and delete this message and its attachments.

    As emails may be altered, Orange is not liable for messages that
    have been modified, changed or falsified.

    Thank you.

    
_________________________________________________________________________________________________________________________

    Ce message et ses pieces jointes peuvent contenir des informations
    confidentielles ou privilegiees et ne doivent donc

    pas etre diffuses, exploites ou copies sans autorisation. Si vous
    avez recu ce message par erreur, veuillez le signaler

    a l'expediteur et le detruire ainsi que les pieces jointes. Les
    messages electroniques etant susceptibles d'alteration,

    Orange decline toute responsabilite si ce message a ete altere,
    deforme ou falsifie. Merci.

    This message and its attachments may contain confidential or
    privileged information that may be protected by law;

    they should not be distributed, used or copied without authorisation.

    If you have received this email in error, please notify the sender
    and delete this message and its attachments.

    As emails may be altered, Orange is not liable for messages that
    have been modified, changed or falsified.

    Thank you.

    
_________________________________________________________________________________________________________________________

    Ce message et ses pieces jointes peuvent contenir des informations
    confidentielles ou privilegiees et ne doivent donc

    pas etre diffuses, exploites ou copies sans autorisation. Si vous
    avez recu ce message par erreur, veuillez le signaler

    a l'expediteur et le detruire ainsi que les pieces jointes. Les
    messages electroniques etant susceptibles d'alteration,

    Orange decline toute responsabilite si ce message a ete altere,
    deforme ou falsifie. Merci.

    This message and its attachments may contain confidential or
    privileged information that may be protected by law;

    they should not be distributed, used or copied without authorisation.

    If you have received this email in error, please notify the sender
    and delete this message and its attachments.

    As emails may be altered, Orange is not liable for messages that
    have been modified, changed or falsified.

    Thank you.



    _______________________________________________

    Lsr mailing list

    [email protected]  <mailto:[email protected]>

    https://www.ietf.org/mailman/listinfo/lsr  
<https://www.ietf.org/mailman/listinfo/lsr>

_________________________________________________________________________________________________________________________
Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.
This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.

_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.

_______________________________________________
Lsr mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/lsr

Reply via email to