Hi Bruno,

From: Lsr <lsr-boun...@ietf.org> on behalf of Bruno Decraene 
<bruno.decra...@orange.com>
Date: Monday, April 27, 2020 at 8:15 AM
To: Robert Raszuk <rob...@raszuk.net>
Cc: "Les Ginsberg (ginsberg)" <ginsberg=40cisco....@dmarc.ietf.org>, 
"lsr@ietf.org" <lsr@ietf.org>, Tony Przygienda <tonysi...@gmail.com>
Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

Robert,

From: Robert Raszuk [mailto:rob...@raszuk.net]
Sent: Monday, April 27, 2020 12:09 PM
To: DECRAENE Bruno TGI/OLN
Cc: Tony Przygienda; Les Ginsberg (ginsberg); lsr@ietf.org
Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed


> Slow flooding increase the likelihood of multiple IGP SPF computations

True.

But if you keep your IGP nicely organized in area and levels, get rid of 
flooding anything incl. /32s domain wide to address bugs in MPLS architecture 
then your flooding radius is usually very small.
[Bruno] First of all, the use of areas/levels brings tradeoffs. Then, after 
their initial design, networks grow and change.

Coming back to flooding, if you have a core router with 50 IGP neighbors, the 
failure of this neighbor requires flooding 50 LSPs. At 33ms pacing between LSPs 
that’s a 1.6s delay/tax, before any computation & FIB update. As you see, it’s 
not related to the number of /32 nor the network diameter.
Some may be fine with this additional 1.6s. Some may not.

I’m not nearly as familiar with IS-IS deployments as OSPF. Are there any 
implementations that don’t offer configuration to override the 33ms inter-LSP 
interval? At Redback (circa 2000), our OSPF implementation defaulted to fast 
flooding and for the MinLSInterval and MinLSArrival OSPF values, you had to 
explicitly remove the fast flooding default if  you wanted to follow RFC 2328. 
Thanks,
Acee

Best
--Bruno


That in turn allows for both fast flooding and fast topology computation while 
only dealing with few external summaries. I am yet to see a practical case 
where a well designed network with today's ISIS requires flooding speedup.

Best,
R.




On Mon, Apr 27, 2020 at 10:34 AM 
<bruno.decra...@orange.com<mailto:bruno.decra...@orange.com>> wrote:

>  ISIS flooding churn (and room for optimization) becomes a problem when node 
> boots up (IMHO this is not a problem) and when node fails while having many 
> neighbors attached. Yes maybe second case could be improved but well designed 
> and operated network should have pre-programmed bypass paths against such 
> cases so IMO stressing IGP to "converge" faster while great in principle may 
> not be really needed in practice.


I don’t think that FRR is a replacement for “fast” (I’d rather say adequate) 
IGP convergence & flooding.

For multiple reasons such as:

-          Multiple ‘things’ depends on the IGP, such as BGP best path 
selection, CSPF/TE/PCE computations, FRR computations

-          Slow flooding increase the likelihood of multiple IGP SPF 
computations which is harmful for other computations which are typically 
heavier and manifolds (cf above)

-          Multiple IGP SPF computations also create multiple transient 
forwarding loops. There are some techniques to remove forwarding loops but this 
is still an advanced topic and some implementations do not handle consecutives 
IGP SPF (with ‘overlapping’ convergences and combined distributed forwarding 
loops)

-          For FRR, you mostly need to pre-decide/configure whether you want to 
protect link or node failures. Tradeoff are involved and given probability of 
events, link protection is usually enabled (hence not node protection)

-          …

Also, given the current “state of the art”, there is no stressing involved. 
Really. Using TCP, my 200€ mobile running on battery and over 
wifi+ADSL+Internet can achieve better communication throughput than a n*100k€ 
high end IS-IS router.
I think many persons agree that IS-IS could do better in term of flooding. 
(possibly not as good as a brand new approach, but incremental improvement also 
have some benefits). Eventually, we don’t need everyone to agree on this.



>  PS. Does anyone have a pointer to any real data showing that performance of 
> real life WAN ISIS deployments is bad ?

In some of our ASes, we do monitor IS-IS by listening to and recording flooded 
LSPs. I can’t share any data.
Next question could be what is “good enough”. I guess this may depend on the 
size of your network, its topology, and your requirements.

We also ran tests in labs. I may share some results during my presentation. (no 
names, possibly no KPI, but some high level outcomes).

Regards,
Bruno


From: Robert Raszuk [mailto:rob...@raszuk.net<mailto:rob...@raszuk.net>]
Sent: Friday, April 24, 2020 12:42 PM
To: DECRAENE Bruno TGI/OLN
Cc: Tony Przygienda; Les Ginsberg (ginsberg); lsr@ietf.org<mailto:lsr@ietf.org>
Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

Hi Bruno  & all,

[Bruno] On my side, I’ll try once and I think the LSR WG should also try to 
improve IS-IS performance. May be if we want to move, we should first release 
the brakes.

Well from my observations releasing the breaks means increasing the risks.

Take BGP - breaks are off and see what happens :)

My personal observation is that ISIS implementations across vendors are just 
fine for vast majority of deployments today. That actually also includes vast 
majority of compute clusters as they consists of max 10s of racks.

Of course there are larger clusters with 1000+ or 10K and above network 
elements itself and x 20 L3 computes, but is there really a need to stretch 
protocol to accommodate those ? Those usually run BGP anyway. And also there is 
DV+LS hybrid too now.

ISIS flooding churn (and room for optimization) becomes a problem when node 
boots up (IMHO this is not a problem) and when node fails while having many 
neighbors attached. Yes maybe second case could be improved but well designed 
and operated network should have pre-programmed bypass paths against such cases 
so IMO stressing IGP to "converge" faster while great in principle may not be 
really needed in practice.

Last I am worried that when IETF defines changes to core protocol behaviour the 
quality of the implementations of those changes may really differ across 
vendors overall resulting in much worse performance and stability as compared 
to where we are today.

I am just not sure if possible gains for few deployments are greater then risk 
for 1000s of today's deployments. Maybe one size does not fit all and for 
massive scale ISIS we should define a notion of "ISIS-DC-PLUGIN" which can be 
optionally in run time added when/if needed. If that requires protocol changes 
to accommodate such dynamic plugins - that work should take place.

Many thx,
R.

PS. Does anyone have a pointer to any real data showing that performance of 
real life WAN ISIS deployments is bad ?


On Fri, Apr 24, 2020 at 11:26 AM 
<bruno.decra...@orange.com<mailto:bruno.decra...@orange.com>> wrote:
Tony

From: Tony Przygienda [mailto:tonysi...@gmail.com<mailto:tonysi...@gmail.com>]
Sent: Thursday, April 23, 2020 7:29 PM
To: DECRAENE Bruno TGI/OLN
Cc: lsr@ietf.org<mailto:lsr@ietf.org>; Les Ginsberg (ginsberg)
Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

I was refering to RFC4960. Bruno, for all practical purposes I think that seems 
to go down the path of trying to reinvent RFC4960 (or ultimately use it).
[Bruno] I don’t think that SCTP (RC4960) is a better fit than TCP. Many more 
features and options than TCP, way more than needed given existing IS-IS 
flooding mechanism. Much less implementations experience and improvement than 
TCP.
Or, changing the packet formats heavily to incorporate all the control loop 
data you need to the point you have a different control channel along those 
lines since you'll find most of the problems RFC4960 is describing (minus stuff 
like multiple paths).
[Bruno] Really, adding one sub-TLV in IS-IS is not “changing the packet formats 
heavily”.
Nothing wrong with that but it's ambitious on a 30 years old anitque artefact 
we're nursing forward here ;-)
[Bruno] I’m perfectly fine with revolution approaches. I think that we can also 
provide incremental improvement to IS-IS.
As entertaining footnote, I saw in last 20 years at least 3 attempts to allow 
multiple TCP sessions in BGP between peers to speed/prioritize things up. All 
failed, after the first one I helped to push I smarted up ;-)
 [Bruno] On my side, I’ll try once and I think the LSR WG should also try to 
improve IS-IS performance. May be if we want to move, we should first release 
the brakes. I’m seen some progress, e.g., from “there is no need to improve 
flooding” to “we all agree to improve flooding”, or from “Network operator just 
need to configure existing CLI” to “We agree that we need something more 
automated/dynamic”. But this has been very slow progress over a year.

--Bruno

As another footnote: I looked @ all the stuff in RIFT (tcp, quic, 4960, more 
ephemeral stuff). I ended up adding to rift bunch very rudimentary things and 
do roughly what Les/Peter/Acee started to write (modulo algorith I contributed 
and bunch things that would be helpful but we can't fit into existing packet 
format). This was as much decision as to "what's available & well debugged" as 
well as "does it meet requirements" as "how complex is that vs. rtx in flooding 
architecture  we have today + some feedback". Not on powerpoint, in real 
production code ;-) rift draft shows you the outcome of that as IMO best 
trade-off to achieve high flooding speeds ;-)

my 2c

-- tony

On Thu, Apr 23, 2020 at 10:15 AM 
<bruno.decra...@orange.com<mailto:bruno.decra...@orange.com>> wrote:
Tony,
Thanks for engaging.
Please inline [Bruno2]



From: Tony Przygienda [mailto:tonysi...@gmail.com<mailto:tonysi...@gmail.com>]
Sent: Wednesday, April 22, 2020 9:25 PM
To: DECRAENE Bruno TGI/OLN
Cc: lsr@ietf.org<mailto:lsr@ietf.org>; Les Ginsberg (ginsberg)
Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed



On Wed, Apr 22, 2020 at 11:03 AM 
<bruno.decra...@orange.com<mailto:bruno.decra...@orange.com>> wrote:
Tony, all,

Thanks Tony for the technical and constructive feedback.
Please inline [Bruno]

From: Tony Przygienda [mailto:tonysi...@gmail.com<mailto:tonysi...@gmail.com>]
Sent: Wednesday, April 22, 2020 1:19 AM
To: Les Ginsberg (ginsberg)
Cc: DECRAENE Bruno TGI/OLN; lsr@ietf.org<mailto:lsr@ietf.org>
Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

neither am I aware of anything like this (i.e. per platform/product flooding 
rate constants) in any major vendor stack for whatever that's worth. It's 
simply unmaintanable, point. All major vendors have extensive product lines 
over so many changing hardware configuration/setups it is simply not viable to 
attempt precise measurements (and even then, user changing config can throw the 
stuff off in a millisecond). There may have been here and there per deployment 
scenario some "recommended config" (not something I immediately recall either) 
but that means very fixed configuration of things & how they go into networks 
and even then I'm not aware of anyone having had a "precise model of the chain 
in the box". yes, probes to measure lots and lots of stuff in the boxes exist 
but again, those are chip/linecard/backplane/chassis/routing engine specific 
and mostly used in complex test/peformance scenarios and not to derive some 
kind of equations that can predict anything much ...
[Bruno] Good points.
Yet, one of the information that we propose to advertise by the LSP receiver to 
the LSP sender is the Receive Window.

-          This is a very common parameter and algorithm. Nothing fancy nor 
reinvented. In particular it’s a parameter used by TCP.

-          I would argue that TCP implementations also run on a variety of 
hardware and systems, must wider range than IS-IS platform. And those TCP 
implementations on all those platform manage to advertise this parameter (TCP 
window)

-          I fail to understand that when some WG contributors proposed the use 
of TCP, nobody said that determining and advertising a Receive Window would be 
an issue, difficult or even impossible. But when we propose to advertise a 
Receive Window in an IS-IS TLV, this becomes difficult or even impossible for 
some platforms. Can anyone help me understand the technical difference?


Bruno, I was waiting for that ;-)
[Bruno2] Good ;-)

Discounted for the fact that I'm not a major TCP expert: TCP is a very 
different beast. it has a 100-200msec fast timer & 500msec slow (which have to 
be quite accurate, it's really one timer for all connections + mbuf and other 
magic) and it sends a _lot_ of packets back and forth with window size 
indications so the negotiation can happen very quickly.  Also, TCP can detect 
losses based on sequence number received contrary to routing protocols (that's 
one of the things we added in RIFT BTW) and it can retransmit quickly when it 
sees a "hole". Contrary to that in ISIS ACKs may or may not come, they may be 
bundled, hellos may or may not come and we can't retransmit stuff on 100msec 
timers either. It's an utterly different transport channel.
[Bruno2] I would distinguish two things, which I think we have done in 
https://tools.ietf.org/html/draft-decraene-lsr-isis-flooding-speed-03

-          How fast you can adapt the sending rate. This seems mostly dependent 
on the speed of the feedback loop, rather than the format of message. E.g. In 
IS-IS the receiver can give a feedback more or less quickly (e.g. depending on 
how fast/bundled it sends the PSNP). In theory, I don’t see a major different. 
From an in implementation standpoint, I’m guessing that the difference is 
probably bigger (e.g. TCP is probably lower level/closer to the 
system/hardware, than IS-IS which is more user space and possibly Platform 
Independent in some organizations))

-          How fast you can detect packet loss. I agree that TCP & IS-IS are 
very different on this. We have proposed to improve this by allowing the 
receiver to advertise to the sender how fast it will ack the LSP. Currently the 
timer/behavior is known to receiver but no to the sender. Hence the sender 
needs to assume the wort case (ISO default).

In more abstract terms, TCP is a sliding N-window protocol (where N is adjusted 
all the time & losses can be efficiently detected) whereas LSR flooding is not 
a windowing protocol (or rather LSDB-size window protocol with selective 
retransmission but no detection of loss [or only very slow based on lack of ACK 
& CSNPs]). Disadvantage of something like TCP (I think I sent out the RFC with 
UDP control protocol work to make it more TCP like)
[Bruno2]  If you are referring to DCCP (Datagram Congestion Control Protocol) 
(RFC 4340), yes you did and thank you for this. Constructive feedback.

-          Regarding flow control, I’ve quickly looked at DCCP and it does not 
provides flow control.

-          Regarding congestion control, possibly the algorithm part may be 
reused. There are two algo and DCCP is open to others. May be one question is 
how much we want IS-IS to be fair to TCP (control plane TCP, not dataplane/user 
plane TCP). To me, IS-IS is more important than BGP traffic, given their 
relative importance to the network, their delay requirements, their typical 
volume of traffic. But that is probably a “detail” down the road. This is also 
depends on whether TCP & IS-IS compete for the same resources (e.g. same queue) 
or not (ideally TCP and IS-IS have different queues).

is that you are stuck when you put something into the pipe, no prioritization 
possible and if receiver is slow you may have multiple obsolete copies in the 
pipe waiting & lots retransmission BW when holes are punched into the data 
through loss. And plain TCP  is actually quite bad for control protocol traffic 
@ scale, almost all vendor run special version of it for BGP for that reason. 
Why that is is out of scope of this list I think ... Flooding is really good to 
send lots of data prioritized/in parallel but on losses re-TX is slow.
[Bruno2] Good that we seem to make the same distinction between the control 
loops for the sending rate vs the retransmission.
Regarding clarifying distinctions, draft may need to better introduce the 
distinction between flow control and congestion control, at least to structure 
the work and the discussion.

Thanks
--Bruno
Bruno, if you're so deeply interested in that stuff we can talk 1:1 off-line 
about implementation work on rift towards adapatable flooding rate
[Bruno] Sure. My pleasure. Please propose me some timeslot offline. Please note 
that I’m based in Europe (CEST), so a priori during your morning and my evening.
If you can also extend the offer to discuss the implementation work on the 
IS-IS implementation of your employer with regards to adaptable flooding rate, 
and/or how network operator can configure the CLI parameters of the LSP senders 
so as to improve flooding rate without overloading the Juniper receiver 
(possibly depending on the capability of the receiver, its number of IS-IS 
neighbors… and/or whatever parameter that you feel are relevant) that would be 
most appreciated. And if you believe that the Juniper LSP receiver can handle 
any rate from any reasonable (e.g. 50)  number of IGP neighbors, without 
(significantly) dropping the received LSPs, that would be even simpler, please 
advise.



ping me for that 1:1 on company email pls

-- tony

_________________________________________________________________________________________________________________________



Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc

pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler

a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,

Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.



This message and its attachments may contain confidential or privileged 
information that may be protected by law;

they should not be distributed, used or copied without authorisation.

If you have received this email in error, please notify the sender and delete 
this message and its attachments.

As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.

Thank you.

_________________________________________________________________________________________________________________________



Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc

pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler

a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,

Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.



This message and its attachments may contain confidential or privileged 
information that may be protected by law;

they should not be distributed, used or copied without authorisation.

If you have received this email in error, please notify the sender and delete 
this message and its attachments.

As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.

Thank you.
_______________________________________________
Lsr mailing list
Lsr@ietf.org<mailto:Lsr@ietf.org>
https://www.ietf.org/mailman/listinfo/lsr

_________________________________________________________________________________________________________________________



Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc

pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler

a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,

Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.



This message and its attachments may contain confidential or privileged 
information that may be protected by law;

they should not be distributed, used or copied without authorisation.

If you have received this email in error, please notify the sender and delete 
this message and its attachments.

As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.

Thank you.

_________________________________________________________________________________________________________________________



Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc

pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler

a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,

Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.



This message and its attachments may contain confidential or privileged 
information that may be protected by law;

they should not be distributed, used or copied without authorisation.

If you have received this email in error, please notify the sender and delete 
this message and its attachments.

As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.

Thank you.
_______________________________________________
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr

Reply via email to