Les, Upgrades are the motivation for deploying multiple algorithms. It allows for incremental rollout of a new algorithm. Yes, there are significant operational considerations.
Reverting to full flooding is neither practical nor necessary. Migration has the strong advantage of having a minimal blast radius, as has been requested. Interoperability is not a serious problem as there is a boundary of legacy flooding between dissimilar algorithms. Once you grasp that you have only a single algorithm within a subgraph, debugging gets a whole lot easier. T p.s. Tony and I have discussed things offline and I am hoping that he will revise his drafts so that they are easier to absorb. > On Dec 4, 2024, at 8:36 AM, Les Ginsberg (ginsberg) - ginsberg at cisco.com > <[email protected]> wrote: > > Tony – > > Upgrades are orthogonal to my comments. > I am speaking about the need to deploy multiple flooding algorithms in a > network (one of which may be “static”). > We have never considered that in scope before – and there are obvious > challenges to doing so – not least of which is the ability to test. > > I think when you say “upgrade” you are talking about needing to migrate from > algorithm X to algorithm Y – or from Algo X-V1 to Algo X-V2 where V2 has some > fix that isn’t fully interoperable with V1. > We already have a way handling this case: > > Revert to base flooding everywhere – do the upgrade – and then enable the > upgraded algo. > Conceptually, this is consistent with how we have deployed major infra > upgrades (e.g., narrow to wide metrics). > > This is far safer than trying to deal with co-existence – not least because > once you allow co-existence you have to allow that a customer might use this > as a permanent state – not just an upgrade state. > Given the challenges we already face with interoperability even when all > routers are trying to “do the same thing” (and I am not limiting this comment > to just flooding) the idea that we should now embrace a persistent state > where routers are intentionally doing inconsistent things seems at best naïve. > > Imagine that you and I are called to root cause problems in a customer > network. > Your implementation supports algorithm X and doesn’t understand algorithm Y. > My implementation supports algorithm Y and doesn’t understand algorithm X. > Flooding issues are notoriously difficult to diagnose – even when all nodes > are supposed to be doing the same thing. > All the while our mutual customer is (rightfully) pressuring to get this > fixed ASAP. > We might well ask “how did we get into this mess”. > > Les > > > From: Tony Li <[email protected]> On Behalf Of Tony Li > Sent: Wednesday, December 4, 2024 7:54 AM > To: Les Ginsberg (ginsberg) <[email protected]> > Cc: Tony Przygienda <[email protected]>; Peter Psenak (ppsenak) > <[email protected]>; Shraddha Hegde <[email protected]>; Robert Raszuk > <[email protected]>; lsr <[email protected]> > Subject: Re: [Lsr] Another counter-example > > > Les, > > The step that you’re missing is that upgrades are inevitable and thus an > operational necessity. > > We are very, very, very unlikely to get things right on the first go. > Therefore, we will need to fix our bugs. How do you deploy that bug fix? Add > to the mix that we’re not willing to do a flag day cutover to the fix. > > A better way of thinking of mesh groups is that they are the ’static routes’ > of legacy flooding. They are installed by network operators and are presumed > to be perfect. No signaling necessary. > > Tony > > > > > > On Dec 4, 2024, at 7:28 AM, Les Ginsberg (ginsberg) - ginsberg at cisco.com > <[email protected] <mailto:[email protected]>> wrote: > > I am very much in agreement with Peter – though I think his commentary is > “too kind”. 😊 > > The issue w mesh groups is that they are opaque to other nodes i.e., you may > come up with a way of signaling that a node has configured mesh groups (which > BTW the distoptflood draft does NOT currently have – and I hope it never > does…) but unless you are going to also propose that a node signal what links > are/are not being used for flooding the best you can do from the POV of other > nodes is treat the node as if it is running a flooding algorithm which is > totally opaque – and which is also “brittle” i.e., it doesn’t do well in the > event of topology changes. > > To Tony P – one of the things that disturbs me about the way this discussion > is taking place is how we seem to have “skipped steps”. > > The interest in optimized flooding dates back decades. > Early attempts include: > > https://datatracker.ietf.org/doc/rfc2973/ (Mesh Groups) (circa 2000) > https://datatracker.ietf.org/doc/html/draft-ietf-ospf-isis-flood-opt-01 > (circa 2001) > MANET work (circa 2014) > > All of these attempts were very conservative in nature. The notion of > deploying multiple solutions simultaneously and thinking about how they might > “interoperate” was quite deliberately not looked at. The general view has > been “be very very careful when you mess with flooding”. > > Suddenly, we now seemed to “leaped off the cliff” and are talking about > deploying multiple algorithms and trying to get them to “interoperate”. > > At what point did the WG conclude that this is a real requirement and that it > actually can be deployed safely? > > If people want to discuss this – the WG is a fine place to do it. But I would > appreciate discussion that does not skip over the very real concerns that > have kept us from even considering this for the last three decades. > > Les > > > > From: Tony Przygienda <[email protected] <mailto:[email protected]>> > Sent: Wednesday, December 4, 2024 12:35 AM > To: Peter Psenak (ppsenak) <[email protected] <mailto:[email protected]>> > Cc: Shraddha Hegde <[email protected] <mailto:[email protected]>>; > Robert Raszuk <[email protected] <mailto:[email protected]>>; Tony Li > <[email protected] <mailto:[email protected]>>; lsr <[email protected] > <mailto:[email protected]>> > Subject: [Lsr] Re: Another counter-example > > Valid point of view but there are other solutions possible to the whole thing > as well that don't precondition mesh-group node lift up, if consensus passes > and we start to work on details of the necessary leaderless signalling in > some framework that's part of operational considerations then would be my > take ... > > thanks > > -- tony > > On Wed, Dec 4, 2024 at 9:25 AM Peter Psenak <[email protected] > <mailto:[email protected]>> wrote: > Hi Shraddha, > > so you define mesh-groups to be a separate flooding algorithm itself, > requiring all routers using them to be upgraded. By the time you do that, > you can also replace mesh-groups with the distop on all routers and be done > with it, instead of trying to solve the coexistence of the two. > > thanks, > Peter > > On 04/12/2024 07:48, Shraddha Hegde wrote: > > Hi Robert, > > With dist-opt flood reduction running in leaderless mode it is possible for > the operator to run > Mesh-groups in some part of the network and introduce distopt flooding in > other part where needed. The nodes configured with mesh-groups have to be > upgraded to advertise, they are running a different flood reduction algorithm > and the distopt algorithm will ensure the neighbors of the Nodes running > meshgroups will always become reflooders and hence the CDS where distopt > runs, is ensured correct flooding behaviour. > > Some networks have the mesh-groups deployed where it’s a well defined part of > the topology and reduces 50% back-flooding with mesh-groups configured. Has > been deployed for many years and serving well. If an operator wants to keep > that config and introduce distopt in other parts of the topology (during > migration or otherwise), It’s a very valid usecase and can be supported with > distopt algorithm. > > Rgds > Shraddha > > > Juniper Business Use Only > From: Robert Raszuk <[email protected]> <mailto:[email protected]> > Sent: 27 November 2024 15:58 > To: Peter Psenak <[email protected]> <mailto:[email protected]> > Cc: Tony Li <[email protected]> <mailto:[email protected]>; Tony Przygienda > <[email protected]> <mailto:[email protected]>; lsr <[email protected]> > <mailto:[email protected]> > Subject: [Lsr] Re: Another counter-example > > [External Email. Be cautious of content] > > > > you are talking about mixing the manual mesh group with optimized flooding. > > I am talking about an accidental mix (legacy configuration at some nodes) not > a planned one. > > And you either auto detect it and disable the ability to optimally flood or > you push full responsibility to the operator. > > Thx, > R. > > On Wed, Nov 27, 2024 at 11:16 AM Peter Psenak <[email protected] > <mailto:[email protected]>> wrote: > Robert, > > On 27/11/2024 10:32, Robert Raszuk wrote: > Peter, > > My point was that this should be at least mentioned in operational > considerations section if dynamic flooding is expected to work in mixed > networks where some nodes support new algorithm and some do not your "regular > flooding case". > > you are talking about mixing the manual mesh group with optimized flooding. I > don't think we want to go that path. > > thanks, > > Peter > > > > > > On Wed, Nov 27, 2024 at 10:28 AM Peter Psenak <[email protected] > <mailto:[email protected]>> wrote: > Robert, > > On 27/11/2024 10:22, Robert Raszuk wrote: > Peter, > > I am not sure if what Tony said is a requirement or an observation. > > > Note that combining routers that run the elected optimized algorithm > > with routers that do run the regular flooding is not a problem. > > Note that static mesh groups can be present today too and you can't assume > that it is either an optimized algorithm or full flooding. > please do not compare apples with oranges. > > Static mesh groups are manually configured and if not done correctly can > result in broken flooding. What we are discussing here is a dynamic flooding > algorithm, not manual flooding blocking. > > thanks, > Peter > > > Thx, > R. > > > On Wed, Nov 27, 2024 at 9:58 AM Peter Psenak > <[email protected] <mailto:[email protected]>> > wrote: > On 27/11/2024 00:18, Tony Li wrote: > > A distributed algorithm computing a flooding topology must only > > operate upon nodes running the same algorithm (and version). If > > multiple algorithms (and/or versions) are running in the same network, > > then any given algorithm and version defines a subgraph and the > > algorithm can only optimize flooding within its own subgraph. Legacy > > full flooding must be used between subgraphs of different algorithms > > or versions. > > This is a new requirement for the flooding algorithm itself. This does > not exist with the existing leader based election, as that guarantees > that only one optimized flooding algorithm is ever present in the area. > Note that combining routers that run the elected optimized algorithm > with routers that do run the regular flooding is not a problem. > > thanks, > Peter > > _______________________________________________ > Lsr mailing list -- [email protected] <mailto:[email protected]> > To unsubscribe send an email to [email protected] <mailto:[email protected]> > > > > > > >
_______________________________________________ Lsr mailing list -- [email protected] To unsubscribe send an email to [email protected]
