[Lsr] Re: Another counter-example

Les Ginsberg (ginsberg) Wed, 04 Dec 2024 08:44:17 -0800

Tony –

Upgrades are orthogonal to my comments.
I am speaking about the need to deploy multiple flooding algorithms in a 
network (one of which may be “static”).
We have never considered that in scope before – and there are obvious 
challenges to doing so – not least of which is the ability to test.

I think when you say “upgrade” you are talking about needing to migrate from 
algorithm X to algorithm Y – or from Algo X-V1 to Algo X-V2 where V2 has some 
fix that isn’t fully interoperable with V1.
We already have a way handling this case:

Revert to base flooding everywhere – do the upgrade – and then enable the 
upgraded algo.
Conceptually, this is consistent with how we have deployed major infra upgrades 
(e.g., narrow to wide metrics).

This is far safer than trying to deal with co-existence – not least because 
once you allow co-existence you have to allow that a customer might use this as 
a permanent state – not just an upgrade state.
Given the challenges we already face with interoperability even when all 
routers are trying to “do the same thing” (and I am not limiting this comment 
to just flooding)   the idea that we should now embrace a persistent state 
where routers are intentionally doing inconsistent things seems at best naïve.

Imagine that you and I are called to root cause problems in a customer network.
Your implementation supports algorithm X and doesn’t understand algorithm Y.
My implementation supports algorithm Y and doesn’t understand algorithm X.
Flooding issues are notoriously difficult to diagnose – even when all nodes are 
supposed to be doing the same thing.
All the while our mutual customer is (rightfully) pressuring to get this fixed 
ASAP.
We might well ask “how did we get into this mess”.

   Les

From: Tony Li <[email protected]> On Behalf Of Tony Li
Sent: Wednesday, December 4, 2024 7:54 AM
To: Les Ginsberg (ginsberg) <[email protected]>
Cc: Tony Przygienda <[email protected]>; Peter Psenak (ppsenak) 
<[email protected]>; Shraddha Hegde <[email protected]>; Robert Raszuk 
<[email protected]>; lsr <[email protected]>
Subject: Re: [Lsr] Another counter-example

Les,

The step that you’re missing is that upgrades are inevitable and thus an 
operational necessity.

We are very, very, very unlikely to get things right on the first go. 
Therefore, we will need to fix our bugs. How do you deploy that bug fix? Add to 
the mix that we’re not willing to do a flag day cutover to the fix.

A better way of thinking of mesh groups is that they are the ’static routes’ of 
legacy flooding.  They are installed by network operators and are presumed to 
be perfect. No signaling necessary.

Tony

On Dec 4, 2024, at 7:28 AM, Les Ginsberg (ginsberg) - ginsberg at cisco.com 
<[email protected]<mailto:[email protected]>> wrote:

I am very much in agreement with Peter – though I think his commentary is “too 
kind”. 😊

The issue w mesh groups is that they are opaque to other nodes i.e., you may 
come up with a way of signaling that a node has configured mesh groups (which 
BTW the distoptflood draft does NOT currently have – and I hope it never does…) 
but unless you are going to also propose that a node signal what links are/are 
not being used for flooding the best you can do from the POV of other nodes is 
treat the node as if it is running a flooding algorithm which is totally opaque 
– and which is also “brittle” i.e., it doesn’t do well in the event of topology 
changes.

To Tony P – one of the things that disturbs me about the way this discussion is 
taking place is how we seem to have “skipped steps”.

The interest in optimized flooding dates back decades.
Early attempts include:

https://datatracker.ietf.org/doc/rfc2973/ (Mesh Groups) (circa 2000)
https://datatracker.ietf.org/doc/html/draft-ietf-ospf-isis-flood-opt-01 (circa 
2001)
MANET work (circa 2014)

All of these attempts were very conservative in nature. The notion of deploying 
multiple solutions simultaneously and thinking about how they might 
“interoperate” was quite deliberately not looked at. The general view has been 
“be very very careful when you mess with flooding”.

Suddenly, we now seemed to “leaped off the cliff” and are talking about 
deploying multiple algorithms and trying to get them to “interoperate”.

At what point did the WG conclude that this is a real requirement and that it 
actually can be deployed safely?

If people want to discuss this – the WG is a fine place to do it. But I would 
appreciate discussion that does not skip over the very real concerns that have 
kept us from even considering this for the last three decades.

   Les

From: Tony Przygienda <[email protected]<mailto:[email protected]>>
Sent: Wednesday, December 4, 2024 12:35 AM
To: Peter Psenak (ppsenak) <[email protected]<mailto:[email protected]>>
Cc: Shraddha Hegde <[email protected]<mailto:[email protected]>>; Robert 
Raszuk <[email protected]<mailto:[email protected]>>; Tony Li 
<[email protected]<mailto:[email protected]>>; lsr 
<[email protected]<mailto:[email protected]>>
Subject: [Lsr] Re: Another counter-example

Valid point of view but there are other solutions possible to the whole thing 
as well that don't precondition mesh-group node lift up, if consensus passes 
and we start to work on details of the necessary leaderless signalling in some 
framework that's part of operational considerations then would be my take ...

thanks

-- tony

On Wed, Dec 4, 2024 at 9:25 AM Peter Psenak 
<[email protected]<mailto:[email protected]>> wrote:

Hi Shraddha,

so you define mesh-groups to be a separate flooding algorithm itself, requiring 
all routers using them to be upgraded.  By the time you do that, you can also 
replace mesh-groups with the distop on all routers and be done with it, instead 
of trying to solve the coexistence of the two.

thanks,
Peter

On 04/12/2024 07:48, Shraddha Hegde wrote:
Hi Robert,

With dist-opt flood reduction running in leaderless mode it is possible for the 
operator to run
Mesh-groups in some part of the network and introduce distopt flooding in other 
part where needed. The nodes configured with  mesh-groups have to be upgraded 
to advertise, they are running a different flood reduction algorithm and the 
distopt algorithm will ensure the neighbors of the Nodes running meshgroups 
will always become reflooders and hence the CDS where distopt runs, is ensured 
correct flooding behaviour.

Some networks have the mesh-groups deployed where it’s a well defined part of 
the topology and reduces 50% back-flooding with mesh-groups configured. Has 
been deployed for many years and serving well.  If an operator wants to keep 
that config and introduce distopt in other parts of the topology (during 
migration or otherwise), It’s a very valid usecase and can be supported with 
distopt algorithm.

Rgds
Shraddha

Juniper Business Use Only
From: Robert Raszuk <[email protected]><mailto:[email protected]>
Sent: 27 November 2024 15:58
To: Peter Psenak <[email protected]><mailto:[email protected]>
Cc: Tony Li <[email protected]><mailto:[email protected]>; Tony Przygienda 
<[email protected]><mailto:[email protected]>; lsr 
<[email protected]><mailto:[email protected]>
Subject: [Lsr] Re: Another counter-example

[External Email. Be cautious of content]

> you are talking about mixing the manual mesh group with optimized flooding.

I am talking about an accidental mix (legacy configuration at some nodes) not a 
planned one.

And you either auto detect it and disable the ability to optimally flood or you 
push full responsibility to the operator.

Thx,
R.

On Wed, Nov 27, 2024 at 11:16 AM Peter Psenak 
<[email protected]<mailto:[email protected]>> wrote:
Robert,

On 27/11/2024 10:32, Robert Raszuk wrote:
Peter,

My point was that this should be at least mentioned in operational 
considerations section if dynamic flooding is expected to work in mixed 
networks where some nodes support new algorithm and some do not your "regular 
flooding case".

you are talking about mixing the manual mesh group with optimized flooding. I 
don't think we want to go that path.

thanks,

Peter

On Wed, Nov 27, 2024 at 10:28 AM Peter Psenak 
<[email protected]<mailto:[email protected]>> wrote:
Robert,

On 27/11/2024 10:22, Robert Raszuk wrote:
Peter,

I am not sure if what Tony said is a requirement or an observation.

> Note that combining routers that run the elected optimized algorithm
> with routers that do run the regular flooding is not a problem.

Note that static mesh groups can be present today too and you can't assume that 
it is either an optimized algorithm or full flooding.

please do not compare apples with oranges.

Static mesh groups are manually configured and if not done correctly can result 
in broken flooding. What we are discussing here is a dynamic flooding 
algorithm, not manual flooding blocking.

thanks,
Peter

Thx,
R.

On Wed, Nov 27, 2024 at 9:58 AM Peter Psenak 
<[email protected]<mailto:[email protected]>> wrote:
On 27/11/2024 00:18, Tony Li wrote:
> A distributed algorithm computing a flooding topology must only
> operate upon nodes running the same algorithm (and version). If
> multiple algorithms (and/or versions) are running in the same network,
> then any given algorithm and version defines a subgraph and the
> algorithm can only optimize flooding within its own subgraph. Legacy
> full flooding must be used between subgraphs of different algorithms
> or versions.

This is a new requirement for the flooding algorithm itself. This does
not exist with the existing leader based election, as that guarantees
that only one optimized flooding algorithm is ever present in the area.
Note that combining routers that run the elected optimized algorithm
with routers that do run the regular flooding is not a problem.

thanks,
Peter

_______________________________________________
Lsr mailing list -- [email protected]<mailto:[email protected]>
To unsubscribe send an email to [email protected]<mailto:[email protected]>

_______________________________________________
Lsr mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[Lsr] Re: Another counter-example

Reply via email to