Martin/Tony – I have a different take on this.
None of the work on optimized flooding includes the ability of the algorithm to detect that a flooding failure is due to an incompatibility with the use of another optimized algorithm – let alone the ability to adapt the flooding topology computed by the algorithm to allow “interoperation” with another algorithm. And, there is no signaling defined to indicate that Algorithm X is now running in amended mode. So, the answer to your original question “How long would it take for a typical implementation to set up the new, fixed flooding graph?” is “This is beyond our capabilities”. And even if we defined how to do this, how would you anticipate and test the scenarios that you might encounter in a live network? This is a good example of why we never want to have multiple algorithms deployed in a network at the same time. Which is why “leaderless mode” is dangerous – because it allows (intentionally or not) multiple algorithms to be enabled in a network at the same time. The “Update Process” is the crown jewel of the IGPs. It has been carefully crafted to be 100% reliable – and proven to be so over decades. We are already taking significant risk (though for good reason) by allowing introduction of an optimized version. Assuming that we can go another order of magnitude further by allowing multiple such algorithms to be enabled at the same time is far too risky. This is a non-goal AFAIAC. Les From: Tony Li <[email protected]> On Behalf Of Tony Li Sent: Tuesday, November 26, 2024 8:32 AM To: [email protected] Cc: Tony Przygienda <[email protected]>; lsr <[email protected]> Subject: [Lsr] Re: A counter example Hi Martin, From the time of the failure: 1. The nodes adjacent to the failure would have to react by updating their LSP and flooding it. There should be at least one LSP change on each side of the partition. 2. This should trigger an SPF run. Further, since the change represents a change to the flooding topology, all nodes would have to recompute the flooding topology. 3. On links that were not part of the flooding topology, the periodic CSNP should detect that databases are not synchronized and cross-flood any LSPs that have not yet crossed the partition of the flooding topology. This should also trigger another SPF run. Most operators would consider that to be an unacceptable convergence time. Regards, Tony On Nov 26, 2024, at 2:26 AM, <[email protected]<mailto:[email protected]>> <[email protected]<mailto:[email protected]>> wrote: Assume this extreme case – different algorithms create a topology with a single point of failure – and this single link actually fails. How long would it take for a typical implementation to set up the new, fixed flooding graph? Best regards, Martin Von: Tony Li <[email protected]<mailto:[email protected]>> Datum: Montag, 25. November 2024 um 17:06 An: Tony Przygienda <[email protected]<mailto:[email protected]>> Cc: lsr <[email protected]<mailto:[email protected]>> Betreff: [Lsr] Re: A counter example Tony, I’m sorry that you don’t see lack of biconnectivity as a fatal flaw. I certainly do. However, there is another extension of that counter-example that results in zero flooding. If you prefer, I will post that too. Tony On Nov 21, 2024, at 1:15 AM, Tony Przygienda <[email protected]<mailto:[email protected]>> wrote: Thanks Tony, good drill down. I see two points here: 1. the point I take here is that in the example resulting prunner framework flooding covers the full graph, i.e. correctness as in "sufficient flooding" is still assured. 2. the solution may be _not_ optimal in terms of constructing a single CDS, i.e. on the boundaries basically full flooding is mandated by the prunner framework. Actually the most extreme case is where _every_ node in the network runs a different algorithm and the prunner framework says "well, flood on all links with different algorithm on the other side". Then it all collapses into full flooding again. If that's my correct reading then please observe that the -prz- draft does NOT state that in mixed algorithm scenarios _optimal_ flooding in any sense is guaranteed (optimality here seems to mean "CDS with minimal number of links"), it only says that prunner framework will guarantee "sufficient" flooding to build an overall CDS, not less and not more. In fact that's the paragraph that is possibly bits cryptical to most saying that you'd need a "meta-prunner" algorithm for such stuff or synchronization of boundaries of a component (funny enough, the considerations in such design start to be closely related to arbitrary hierarchy principles ;-). There are other considerations but they become even more arcane and AFAIS achieving an "optimal" CDS when components with multiple algorithms are mixed is in pragmatic terms not possible. So, if we agree that prunner framework (i.e. miximing multiple algorithms) does guarantee "sufficient" flooding (i.e. full CDS) but does NOT guarantee any "only necessary" flooding then we're in sync. And it's perfectly fine AFAIS if the WG decides that working on "multiple algorithm mix in the network" is not something to be pursuited, it will be sufficient then to e.g. extend the 97xx to provide leader-based and leaderless signalling as two options (just like there is centralized computed and distributed already) and say that "mixing both modes or multiple algorithms under leaderless is outside the scope of the document". Not every problem under the sun needs to be solved by a WG and practical implications of such scope limitations AFAIS are limited since in practical purposes mixing limits blast radius on migrations and nothing else really AFAIU ;-) So I guess I wasn't specific enough when I said that I don't see a counter example for -prz- framework not being correct. By correctness I always meant "any mix of algorithms being prunners in a network will always deliver _sufficient_ flooding" and not implied any kind of flooding optimality. thanks --- tony On Wed, Nov 20, 2024 at 10:57 PM Tony Li <[email protected]<mailto:[email protected]>> wrote: Hi all, Tony P. asked for a counter-example to why neighbor-only algorithm information is sufficient. This email attempts to articulate just such an example. Suppose that we have a bi-partite network, with two halves, A and B. Part A contains nodes A1, A2, A3, …. Part B contains nodes B1, B2, B3, …. The two halves are connected by three links (A1, B1), (A2, B2), and (A3, B3). The correct flooding topology in this situation is to select exactly two of the three links. Selecting only one of the links would create a single point of failure. Selecting all three links leads to unacceptable and unnecessary flooding. Suppose that A1 and B1 are running algorithm X. All other nodes are running algorithm Y. Suppose that under algorithm X, links 2 and 3 are selected. Therefore, A1 and B1 choose to prune (A1, B1). Further, suppose that under algorithm Y, links 1 and 2 are selected. Therefore, nodes A3 and B3 choose to prune (A3, B). Now, only (A2, B2) is selected, creating a single point of failure. The key points here are simple: - An algorithm makes assumptions about how other nodes in the topology are going to behave. If multiple algorithms are in play, those assumptions may not hold. - Two concurrent algorithms, while each individually correct, can still produce a flawed flooding topology if they are asked to interoperate. - Full flooding at the boundary between the algorithms is not sufficient to correct the situation. Regards, Tony _______________________________________________ Lsr mailing list -- [email protected]<mailto:[email protected]> To unsubscribe send an email to [email protected]<mailto:[email protected]>
_______________________________________________ Lsr mailing list -- [email protected] To unsubscribe send an email to [email protected]
