Just to prop a voice of support to Tony Li's arguments which I have nothing further to add really. He basically elucidated ;-) with flourishes what I wrote in my short, terse email I think.
As he says "you either make easy choices and get an operationally unstable environment at scale/large disturbances or you make hard architectural choices & scale much better over time". Examples of that are abound in systems design but it's unfortunately often RFC1295 (4). As a sidenote: IETF has no intention or mechanism to stop people doing unscalable, poorly designed things with their specs and that was ok as long people did not try to push them onto the whole world without listening to folks who took the scar tissue to get us the IETF working specs we have today which seems to have become fashionable in last couple of years. And fast flooding is a red herring here IMO, it does nothing but accelerate the control loop, if the control loop is positively stable, it's good, if the control loops are oscillating or start to negatively amplify accelerating things just melts everything faster. Ultimately, having followed this "discussion" my opinion is still that if authors would like to abuse IGP as "domain wide broadcast" for liveliness notification the "events" draft is far less fragile and convoluted but should be kept in a service instance as basically "event based BFD substitute" to not start to cause head-blocking on IGP resources. And AFAIR Robert observed it's still not a very good indication compared to BFD, a good solution would be e.g. in PE case a (hierarchical) MP2MP BFD PMSI (assuming UDP healthy = TCP healthy which is however far less an assumption than "flooding feels transport is OK"). -- tony On Mon, Nov 22, 2021 at 7:55 AM Tony Li <[email protected]> wrote: > > Les, > > > The problem is that restricting the prefix length does nothing to limit > the number of advertisements that get flooded. In a high-scale situation, > when there is a mass failure, it would lead to a flooding spike. That’s > exactly not the time to stress the system. > > *[LES:] As I have stated previously, I share your concern about the > behavior during massive events – and some care has to be taken to prevent > making a bad situation worse.* > *That said, the WG (including you and I) is taking on enhancements to > support much faster flooding – on the order of hundreds (perhaps thousands) > of LSPs/second. We believe this can be done safely (though proof has not > yet been established).* > > > > And the point of doing that was to help improve IGP convergence time… > > > *So, if you believe (as your active participation suggests) that IGPs can > support faster flooding – why do you believe they cannot support liveness > notification at a similar scale?* > > > > … not waste our time by inflating the LSDB by the same amount that we sped > up flooding. > > Also, I don’t see how faster flooding has ANYTHING to do with it. Adding > negative liveness information is primary a scale issue. > > > > *I get that you consider such notifications as architecturally undesirable > – we can agree to disagree on that point.* > *But I don’t get why you think the IGP’s ability to handle large scale > events is a showstopper in this case.* > > > > I am opposed to anything that adds to the scale of the LSDB. Doubly so if > it does so during failures, when the IGP is already under stress. As you > well know, making an IGP stable during normal operations is one thing. > Ensuring that it is stable during worst case topological changes is quite > another. Adding scale during a mass failure is pessimal timing. > > T > > >
_______________________________________________ Lsr mailing list [email protected] https://www.ietf.org/mailman/listinfo/lsr
