[Lsr] Re: A counter example

Les Ginsberg (ginsberg) Tue, 26 Nov 2024 09:09:07 -0800

Martin/Tony –

I have a different take on this.


None of the work on optimized flooding includes the ability of the algorithm to 
detect that a flooding failure is due to an incompatibility with the use of 
another optimized algorithm – let alone the ability to adapt the flooding 
topology computed by the algorithm to allow “interoperation” with another 
algorithm.
And, there is no signaling defined to indicate that Algorithm X is now running 
in amended mode.

So, the answer to your original question “How long would it take for a typical 
implementation to set up the new, fixed flooding graph?” is “This is beyond our 
capabilities”.
And even if we defined how to do this, how would you anticipate and test the 
scenarios that you might encounter in a live network?

This is a good example of why we never want to have multiple algorithms 
deployed in a network at the same time.
Which is why “leaderless mode” is dangerous – because it allows (intentionally 
or not) multiple algorithms to be enabled in a network at the same time.

The “Update Process” is the crown jewel of the IGPs. It has been carefully 
crafted to be 100% reliable – and proven to be so over decades.
We are already taking significant risk (though for good reason) by allowing 
introduction of an optimized version.
Assuming that we can go another order of magnitude further by allowing multiple 
such algorithms to be enabled at the same time is far too risky. This is a 
non-goal AFAIAC.

   Les


From: Tony Li <[email protected]> On Behalf Of Tony Li
Sent: Tuesday, November 26, 2024 8:32 AM
To: [email protected]
Cc: Tony Przygienda <[email protected]>; lsr <[email protected]>
Subject: [Lsr] Re: A counter example


Hi Martin,

From the time of the failure:

1. The nodes adjacent to the failure would have to react by updating their LSP 
and flooding it. There should be at least one LSP change on each side of the 
partition.
2. This should trigger an SPF run. Further, since the change represents a 
change to the flooding topology, all nodes would have to recompute the flooding 
topology.
3. On links that were not part of the flooding topology, the periodic CSNP 
should detect that databases are not synchronized and cross-flood any LSPs that 
have not yet crossed the partition of the flooding topology.  This should also 
trigger another SPF run.

Most operators would consider that to be an unacceptable convergence time.

Regards,
Tony



On Nov 26, 2024, at 2:26 AM, 
<[email protected]<mailto:[email protected]>> 
<[email protected]<mailto:[email protected]>> wrote:

Assume this extreme case – different algorithms create a topology with a single 
point of failure – and this single link actually fails. How long would it take 
for a typical implementation to set up the new, fixed flooding graph?

Best regards, Martin

Von: Tony Li <[email protected]<mailto:[email protected]>>
Datum: Montag, 25. November 2024 um 17:06
An: Tony Przygienda <[email protected]<mailto:[email protected]>>
Cc: lsr <[email protected]<mailto:[email protected]>>
Betreff: [Lsr] Re: A counter example

Tony,

I’m sorry that you don’t see lack of biconnectivity as a fatal flaw.  I 
certainly do.

However, there is another extension of that counter-example that results in 
zero flooding.  If you prefer, I will post that too.

Tony




On Nov 21, 2024, at 1:15 AM, Tony Przygienda 
<[email protected]<mailto:[email protected]>> wrote:

Thanks Tony, good drill down. I see two points here:

1. the point I take here is that in the example resulting prunner framework 
flooding covers the full graph, i.e. correctness as in "sufficient flooding" is 
still assured.
2. the solution may be _not_ optimal in terms of constructing a single CDS, 
i.e. on the boundaries basically full flooding is mandated by the prunner 
framework. Actually the most extreme case is where _every_ node in the network 
runs a different algorithm and the prunner framework says "well, flood on all 
links with different algorithm on the other side". Then it all collapses into 
full flooding again.

If that's my correct reading then please observe that the -prz- draft does NOT 
state that in mixed algorithm scenarios _optimal_ flooding in any sense is 
guaranteed (optimality here seems to mean "CDS with minimal number of links"), 
it only says that prunner framework will guarantee "sufficient" flooding to 
build an overall CDS, not less and not more. In fact that's the paragraph that 
is possibly bits cryptical to most saying that you'd need a "meta-prunner" 
algorithm for such stuff or synchronization of boundaries of a component (funny 
enough, the considerations in such design start to be closely related to 
arbitrary hierarchy principles ;-). There are other considerations but they 
become even more arcane and AFAIS achieving an "optimal" CDS when components 
with multiple algorithms are mixed is in pragmatic terms not possible.

So, if we agree that prunner framework (i.e. miximing multiple algorithms) does 
guarantee "sufficient" flooding (i.e. full CDS) but does NOT guarantee any 
"only necessary" flooding then we're in sync. And it's perfectly fine AFAIS if 
the WG decides that working on "multiple algorithm mix in the network" is not 
something to be pursuited, it will be sufficient then to e.g. extend the 97xx 
to provide leader-based and leaderless signalling as two options (just like 
there is centralized computed and distributed already) and say that "mixing 
both modes or multiple algorithms under leaderless is outside the scope of the 
document". Not every problem under the sun needs to be solved by a WG and 
practical implications of such scope limitations AFAIS are limited since in 
practical purposes mixing limits blast radius on migrations and nothing else 
really AFAIU ;-)

So I guess I wasn't specific enough when I said that I don't see a counter 
example for -prz- framework not being correct. By correctness I always meant 
"any mix of algorithms being prunners in a network will always deliver 
_sufficient_ flooding" and not implied any kind of flooding optimality.

thanks

--- tony


On Wed, Nov 20, 2024 at 10:57 PM Tony Li 
<[email protected]<mailto:[email protected]>> wrote:

Hi all,

Tony P. asked for a counter-example to why neighbor-only algorithm information 
is sufficient. This email attempts to articulate just such an example.

Suppose that we have a bi-partite network, with two halves, A and B.  Part A 
contains nodes A1, A2, A3, ….  Part B contains nodes B1, B2, B3, ….

The two halves are connected by three links (A1, B1), (A2, B2), and (A3, B3).

The correct flooding topology in this situation is to select exactly two of the 
three links. Selecting only one of the links would create a single point of 
failure. Selecting all three links leads to unacceptable and unnecessary 
flooding.

Suppose that A1 and B1 are running algorithm X.  All other nodes are running 
algorithm Y.

Suppose that under algorithm X, links 2 and 3 are selected.  Therefore, A1 and 
B1 choose to prune (A1, B1).  Further, suppose that under algorithm Y, links 1 
and 2 are selected. Therefore, nodes A3 and B3 choose to prune (A3, B).  Now, 
only (A2, B2) is selected, creating a single point of failure.

The key points here are simple:

- An algorithm makes assumptions about how other nodes in the topology are 
going to behave. If multiple algorithms are in play, those assumptions may not 
hold.

- Two concurrent algorithms, while each individually correct, can still produce 
a flawed flooding topology if they are asked to interoperate.

- Full flooding at the boundary between the algorithms is not sufficient to 
correct the situation.

Regards,
Tony

_______________________________________________
Lsr mailing list -- [email protected]<mailto:[email protected]>
To unsubscribe send an email to [email protected]<mailto:[email protected]>

_______________________________________________
Lsr mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[Lsr] Re: A counter example

Reply via email to