[Lsr] Re: Another counter-example

Robert Raszuk Wed, 04 Dec 2024 08:50:30 -0800

> Revert to base flooding everywhere

If network can deal with base flooding why to optimize it ?


I proposed to use MI-ISIS for migrations/upgrades and never enable base
flooding.

Also I highly recommend we look what CPU/RAM usage we are dealing here with
full flooding in large topologies vs optimized flooding. Basically while it
looks cool on slides are we dealing with real problem for lot's of networks
or maybe just for very specific few densely meshed fabrics where other
solutions already exist ?

Last but not least I would not dismiss ideas of link state flooding
reflectors .. at least migrations can be made much more controllable then
full or partial spray.

Thx,
R.

On Wed, Dec 4, 2024 at 5:36 PM Les Ginsberg (ginsberg) <[email protected]>
wrote:

> Tony –
>
>
>
> Upgrades are orthogonal to my comments.
>
> I am speaking about the need to deploy multiple flooding algorithms in a
> network (one of which may be “static”).
>
> We have never considered that in scope before – and there are obvious
> challenges to doing so – not least of which is the ability to test.
>
>
>
> I think when you say “upgrade” you are talking about needing to migrate
> from algorithm X to algorithm Y – or from Algo X-V1 to Algo X-V2 where V2
> has some fix that isn’t fully interoperable with V1.
>
> We already have a way handling this case:
>
>
>
> Revert to base flooding everywhere – do the upgrade – and then enable the
> upgraded algo.
>
> Conceptually, this is consistent with how we have deployed major infra
> upgrades (e.g., narrow to wide metrics).
>
>
>
> This is far safer than trying to deal with co-existence – not least
> because once you allow co-existence you have to allow that a customer might
> use this as a permanent state – not just an upgrade state.
>
> Given the challenges we already face with interoperability even when all
> routers are trying to “do the same thing” (and I am not limiting this
> comment to just flooding)   the idea that we should now embrace a
> persistent state where routers are intentionally doing inconsistent things
> seems at best naïve.
>
>
>
> Imagine that you and I are called to root cause problems in a customer
> network.
>
> Your implementation supports algorithm X and doesn’t understand algorithm
> Y.
>
> My implementation supports algorithm Y and doesn’t understand algorithm X.
>
> Flooding issues are notoriously difficult to diagnose – even when all
> nodes are supposed to be doing the same thing.
>
> All the while our mutual customer is (rightfully) pressuring to get this
> fixed ASAP.
>
> We might well ask “how did we get into this mess”.
>
>
>
>    Les
>
>
>
>
>
> *From:* Tony Li <[email protected]> *On Behalf Of *Tony Li
> *Sent:* Wednesday, December 4, 2024 7:54 AM
> *To:* Les Ginsberg (ginsberg) <[email protected]>
> *Cc:* Tony Przygienda <[email protected]>; Peter Psenak (ppsenak) <
> [email protected]>; Shraddha Hegde <[email protected]>; Robert Raszuk <
> [email protected]>; lsr <[email protected]>
> *Subject:* Re: [Lsr] Another counter-example
>
>
>
>
>
> Les,
>
>
>
> The step that you’re missing is that upgrades are inevitable and thus an
> operational necessity.
>
>
>
> We are very, very, very unlikely to get things right on the first go.
> Therefore, we will need to fix our bugs. How do you deploy that bug fix?
> Add to the mix that we’re not willing to do a flag day cutover to the fix.
>
>
>
> A better way of thinking of mesh groups is that they are the ’static
> routes’ of legacy flooding.  They are installed by network operators and
> are presumed to be perfect. No signaling necessary.
>
>
>
> Tony
>
>
>
>
>
>
>
>
>
> On Dec 4, 2024, at 7:28 AM, Les Ginsberg (ginsberg) - ginsberg at
> cisco.com <[email protected]> wrote:
>
>
>
> I am very much in agreement with Peter – though I think his commentary is
> “too kind”. 😊
>
>
>
> The issue w mesh groups is that they are opaque to other nodes i.e., you
> may come up with a way of signaling that a node has configured mesh groups
> (which BTW the distoptflood draft does NOT currently have – and I hope it
> never does…) but unless you are going to also propose that a node signal
> what links are/are not being used for flooding the best you can do from the
> POV of other nodes is treat the node as if it is running a flooding
> algorithm which is totally opaque – and which is also “brittle” i.e., it
> doesn’t do well in the event of topology changes.
>
>
>
> To Tony P – one of the things that disturbs me about the way this
> discussion is taking place is how we seem to have “skipped steps”.
>
>
>
> The interest in optimized flooding dates back decades.
>
> Early attempts include:
>
>
>
> https://datatracker.ietf.org/doc/rfc2973/ (Mesh Groups) (circa 2000)
>
> https://datatracker.ietf.org/doc/html/draft-ietf-ospf-isis-flood-opt-01
> (circa 2001)
>
> MANET work (circa 2014)
>
>
>
> All of these attempts were very conservative in nature. The notion of
> deploying multiple solutions simultaneously and thinking about how they
> might “interoperate” was quite deliberately not looked at. The general view
> has been “be very very careful when you mess with flooding”.
>
>
>
> Suddenly, we now seemed to “leaped off the cliff” and are talking about
> deploying multiple algorithms and trying to get them to “interoperate”.
>
>
>
> At what point did the WG conclude that this is a real requirement and that
> it actually can be deployed safely?
>
>
>
> If people want to discuss this – the WG is a fine place to do it. But I
> would appreciate discussion that does not skip over the very real concerns
> that have kept us from even considering this for the last three decades.
>
>
>
>    Les
>
>
>
>
>
>
>
> *From:* Tony Przygienda <[email protected]>
> *Sent:* Wednesday, December 4, 2024 12:35 AM
> *To:* Peter Psenak (ppsenak) <[email protected]>
> *Cc:* Shraddha Hegde <[email protected]>; Robert Raszuk <
> [email protected]>; Tony Li <[email protected]>; lsr <[email protected]>
> *Subject:* [Lsr] Re: Another counter-example
>
>
>
> Valid point of view but there are other solutions possible to the whole
> thing as well that don't precondition mesh-group node lift up, if consensus
> passes and we start to work on details of the necessary leaderless
> signalling in some framework that's part of operational considerations then
> would be my take ...
>
>
>
> thanks
>
>
>
> -- tony
>
>
>
> On Wed, Dec 4, 2024 at 9:25 AM Peter Psenak <[email protected]> wrote:
>
> Hi Shraddha,
>
> so you define mesh-groups to be a separate flooding algorithm itself,
> requiring all routers using them to be upgraded.  By the time you do that,
> you can also replace mesh-groups with the distop on all routers and be done
> with it, instead of trying to solve the coexistence of the two.
>
> thanks,
> Peter
>
> On 04/12/2024 07:48, Shraddha Hegde wrote:
>
> Hi Robert,
>
>
>
> With dist-opt flood reduction running in leaderless mode it is possible
> for the operator to run
>
> Mesh-groups in some part of the network and introduce distopt flooding in
> other part where needed. The nodes configured with  mesh-groups have to be
> upgraded to advertise, they are running a different flood reduction
> algorithm and the distopt algorithm will ensure the neighbors of the Nodes
> running meshgroups will always become reflooders and hence the CDS where
> distopt runs, is ensured correct flooding behaviour.
>
>
>
> Some networks have the mesh-groups deployed where it’s a well defined part
> of the topology and reduces 50% back-flooding with mesh-groups configured.
> Has been deployed for many years and serving well.  If an operator wants to
> keep that config and introduce distopt in other parts of the topology
> (during migration or otherwise), It’s a very valid usecase and can be
> supported with distopt algorithm.
>
>
>
> Rgds
>
> Shraddha
>
>
>
>
>
> *Juniper Business Use Only*
>
> *From:* Robert Raszuk <[email protected]> <[email protected]>
> *Sent:* 27 November 2024 15:58
> *To:* Peter Psenak <[email protected]> <[email protected]>
> *Cc:* Tony Li <[email protected]> <[email protected]>; Tony Przygienda
> <[email protected]> <[email protected]>; lsr <[email protected]>
> <[email protected]>
> *Subject:* [Lsr] Re: Another counter-example
>
>
>
> *[External Email. Be cautious of content]*
>
>
>
>
>
> > you are talking about mixing the manual mesh group with optimized
> flooding.
>
>
>
> I am talking about an accidental mix (legacy configuration at some nodes)
> not a planned one.
>
>
>
> And you either auto detect it and disable the ability to optimally flood
> or you push full responsibility to the operator.
>
>
>
> Thx,
>
> R.
>
>
>
> On Wed, Nov 27, 2024 at 11:16 AM Peter Psenak <[email protected]> wrote:
>
> Robert,
>
>
>
> On 27/11/2024 10:32, Robert Raszuk wrote:
>
> Peter,
>
>
>
> My point was that this should be at least mentioned in operational
> considerations section if dynamic flooding is expected to work in mixed
> networks where some nodes support new algorithm and some do not
> your "regular flooding case".
>
>
>
> you are talking about mixing the manual mesh group with optimized
> flooding. I don't think we want to go that path.
>
> thanks,
>
> Peter
>
>
>
>
>
>
>
> On Wed, Nov 27, 2024 at 10:28 AM Peter Psenak <[email protected]> wrote:
>
> Robert,
>
>
>
> On 27/11/2024 10:22, Robert Raszuk wrote:
>
> Peter,
>
>
>
> I am not sure if what Tony said is a requirement or an observation.
>
>
>
> > Note that combining routers that run the elected optimized algorithm
>
> > with routers that do run the regular flooding is not a problem.
>
>
>
> Note that static mesh groups can be present today too and you can't assume
> that it is either an optimized algorithm or full flooding.
>
> please do not compare apples with oranges.
>
> Static mesh groups are manually configured and if not done correctly can
> result in broken flooding. What we are discussing here is a dynamic
> flooding algorithm, not manual flooding blocking.
>
> thanks,
> Peter
>
>
>
> Thx,
>
> R.
>
>
>
>
>
> On Wed, Nov 27, 2024 at 9:58 AM Peter Psenak <ppsenak=
> [email protected]> wrote:
>
> On 27/11/2024 00:18, Tony Li wrote:
> > A distributed algorithm computing a flooding topology must only
> > operate upon nodes running the same algorithm (and version). If
> > multiple algorithms (and/or versions) are running in the same network,
> > then any given algorithm and version defines a subgraph and the
> > algorithm can only optimize flooding within its own subgraph. Legacy
> > full flooding must be used between subgraphs of different algorithms
> > or versions.
>
> This is a new requirement for the flooding algorithm itself. This does
> not exist with the existing leader based election, as that guarantees
> that only one optimized flooding algorithm is ever present in the area.
> Note that combining routers that run the elected optimized algorithm
> with routers that do run the regular flooding is not a problem.
>
> thanks,
> Peter
>
> _______________________________________________
> Lsr mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
>
>
>
>
>
>
>
>
>

_______________________________________________
Lsr mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[Lsr] Re: Another counter-example

Reply via email to