Just to prop a voice of support to Tony Li's arguments which I have nothing
further to add really. He basically elucidated ;-) with flourishes what I
wrote in my short, terse email I think.

As he says "you either make easy choices and get an operationally unstable
environment at scale/large disturbances or you make hard architectural
choices & scale much better over time". Examples of that are abound in
systems design but it's unfortunately often RFC1295 (4). As a sidenote:
IETF has no intention or mechanism to stop people doing unscalable, poorly
designed things with their specs and that was ok as long people did not try
to push them onto the whole world without listening to folks who took the
scar tissue to get us the IETF working specs we have today which seems to
have become fashionable in last couple of years.

And fast flooding is a red herring here IMO, it does nothing but accelerate
the control loop, if the control loop is positively stable, it's good, if
the control loops are oscillating or start to negatively amplify
accelerating things just melts everything faster.

Ultimately, having followed this "discussion" my opinion is still that if
authors would like to abuse IGP as "domain wide broadcast" for liveliness
notification the "events" draft is far less fragile and convoluted but
should be kept in a service instance as basically "event based BFD
substitute" to not start to cause head-blocking on IGP resources. And AFAIR
Robert observed it's still not a very good indication compared to BFD, a
good solution would be e.g. in PE case a (hierarchical) MP2MP BFD PMSI
(assuming UDP healthy = TCP healthy which is however far less an assumption
than "flooding feels transport is OK").

-- tony

On Mon, Nov 22, 2021 at 7:55 AM Tony Li <[email protected]> wrote:

>
> Les,
>
>
> The problem is that restricting the prefix length does nothing to limit
> the number of advertisements that get flooded.  In a high-scale situation,
> when there is a mass failure, it would lead to a flooding spike. That’s
> exactly not the time to stress the system.
>
> *[LES:] As I have stated previously, I share your concern about the
> behavior during massive events – and some care has to be taken to prevent
> making a bad situation worse.*
> *That said, the WG (including you and I)  is taking on enhancements to
> support much faster flooding – on the order of hundreds (perhaps thousands)
> of LSPs/second. We believe this can be done safely (though proof has not
> yet been established).*
>
>
>
> And the point of doing that was to help improve IGP convergence time…
>
>
> *So, if you believe (as your active participation suggests) that IGPs can
> support faster flooding – why do you believe they cannot support liveness
> notification at a similar scale?*
>
>
>
> … not waste our time by inflating the LSDB by the same amount that we sped
> up flooding.
>
> Also, I don’t see how faster flooding has ANYTHING to do with it. Adding
> negative liveness information is primary a scale issue.
>
>
>
> *I get that you consider such notifications as architecturally undesirable
> – we can agree to disagree on that point.*
> *But I don’t get why you think the IGP’s ability to handle large scale
> events is a showstopper in this case.*
>
>
>
> I am opposed to anything that adds to the scale of the LSDB. Doubly so if
> it does so during failures, when the IGP is already under stress. As you
> well know, making an IGP stable during normal operations is one thing.
> Ensuring that it is stable during worst case topological changes is quite
> another. Adding scale during a mass failure is pessimal timing.
>
> T
>
>
>
_______________________________________________
Lsr mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/lsr

Reply via email to