Blast radius is a concept that comes from security really (originally grenades in fact ;-) and is quite commonly used to describe "size of impact of a failure/misconfiguration/change on an entity or the entity being compromised ". We can loosely talk in this context here about "re-computation and change of CDS links on a topological change, i.e. link/node failure or join/removal from an algo". It is also now used to consider in deployment "impact of change/(mis-)configuration on a single node onto the forwarding in the network" due to significant number of outages within last couple of years caused by such "changes" in very large networks. Colloquially, one could argue that e.g. redistribution of prefixes into a node has a significant blast radius (we know _those_ failures) or even a link failure (due to involved SPF) but IGP being IGP, certain things are distributed computation and core functionality and cannot be easily mitigated (though e.g. knobs to prevent excessive redistribution via unintended configuration are common now).
On my count there were significantly more than two large operators of large ISIS networks that chimed in already and clearly indicated they want work on leaderless solution to happen in the WG (and multiple ones holding same opinion chose to stay silent to my knowledge) and the consensus call was extended specifically for "Leaderless Flooding Algorithm for Distributed Flood Reduction to allow reduced configuration, minimal blast radius, and ease of incremental deployment" and not some "additional mechanism ...". if the consensus call passes AFAIS the implications on the points below is: * blast radius is defined above unless another definition is extended. basically the desire is that " reconfiguration/failure of a single node influences the minimal possible amount of other nodes in the network". While not inherent property of a leaderless algorithm, the suggested disttopo comes as close as we could make it to the goal unless something else is suggested or improvements to the algorithm shown. * unless the risks are outlined via clear technical or operational examples or counter example to -prz- framework draft is provided I would consider further claims to apocalyptic outcomes the moment two nodes are configured to use different prunners lacking rationale. Yes, the framework is restricted to algorithms being prunners (with the additional addition of strict MUST for CDS-only-in-own-component that crystallized in this thread). My previous email delivered the logical chain that makes the prunner property "necessary and sufficient" to achieve interoperable co-existence of distributed flood reduction algorithms (and even centralized computed ones). * having multiple algorithms or versions during transition phase addresses the "ease of incremental deployment with minimal configuration and blast radius" of the consensus call AFAIS Does "leaderless" force an operator into running multiple algorithms at the same time? It does not. After the leaderless work is done disttopo could be e.g. added to RFC97xx as well if an operator prefers that mode of operation for some reasons or migration can happen by simply disabling/enabling one algorithm after another which is a minor variation on the flag day where a leader is replaced by e.g. a network provisioning automation. >From here on the discussion would benefit from specific technical and operational examples of risks or unnecessary complexity in meeting the goals of the consensus call rather than held beliefs AFAIS thanks -- tony On Thu, Dec 5, 2024 at 4:09 PM Les Ginsberg (ginsberg) <[email protected]> wrote: > Tony – > > > > There are multiple assumptions implicit in your response. > > > > You assume that the understanding of the realities of “blast radius” by > all parties is accurate and correct. I believe this still requires > examination i.e., that the actual “blast radius” associated with > leader-based when implemented correctly is not inevitably global. > > > > You assume that the risks associated with having multiple algorithms > enabled in the network (either as a transient or a permanent state) have > been fully vetted. I think this deserves further scrutiny. > > > > You assume that there are real deployment needs to have multiple > algorithms deployed simultaneously in a network. I believe this deserves > further scrutiny. > > > > I believe we are closer to the beginning of this discussion than the end. > > > > The consensus call started by Acee was “whether or not we want to work on > an additional mechanism…”. > > I agree that the clear consensus on that is “yes” – but what we have > agreed to is to discuss/work – we haven’t actually done the work yet. > > > > Les > > > > > > *From:* Tony Li <[email protected]> *On Behalf Of *Tony Li > *Sent:* Thursday, December 5, 2024 6:53 AM > *To:* Peter Psenak (ppsenak) <[email protected]> > *Cc:* Les Ginsberg (ginsberg) <[email protected]>; Tony Przygienda < > [email protected]>; Shraddha Hegde <[email protected]>; Robert > Raszuk <[email protected]>; lsr <[email protected]> > *Subject:* [Lsr] Re: Another counter-example > > > > Hi Peter, > > > > One can migrate from one algo to the other without reverting to the full > flooding using the leader announced algo. > > > > > > Perhaps you missed the numerous operators who have requested a leaderless > approach that limited the blast radius. > > > > Acee started a consensus check here: > https://mailarchive.ietf.org/arch/msg/lsr/4HZD9pxaHMCDhfUQMtb4mepBW4Q/ > > > > I have yet to see a closure of the consensus check, but IMHO, the trend is > quite clear. > > > > Tony > > > > >
_______________________________________________ Lsr mailing list -- [email protected] To unsubscribe send an email to [email protected]
