Tony – In the interest of brevity, I am not going to respond in detail to each of your points. My reply focuses on two things.
1)You can successfully deploy this algorithm in the presence of nodes which do NOT support this algorithm. But you cannot successfully deploy this algorithm in the presence of nodes which enable a different flooding reduction algorithm. Given that this is not the only flooding reduction algorithm which has already been proposed – nor is it likely the last one to be proposed – it would seem advantageous and prudent to provide a means for nodes to know what algorithm is in use and ensure that multiple algorithms are not enabled simultaneously – which is what draft-ietf-lsr-dynamic-flooding provides. You seem to be saying “this is the only flooding reduction algorithm we need” and you are not interested in allowing deployment of anything else – now or in the future. This lessens my enthusiasm for this draft. The mechanisms proposed in draft-ietf-lsr-dynamic-flooding are analogous to what is used for DIS election and (more recently) for selecting the winning FAD for a given flex-algo. Given the significant deployment of flex-algo and the long history of DIS election, I am surprised at the degree of concern you have for the use of these mechanisms. 2)Regarding the use of PSNPs…you propose to send a PSNP (once apparently) which has the LSP entries for all the LSPs which you chose NOT to flood to a given node (minus any LSPs for which you may have received an explicit ack) in the most recent time interval - suggested to be one second. What will happen when you send this? Let’s use a simple example where one LSP was selectively flooded – call it A.00-01(Seq #100). NOTE: This example assumes a P2P circuit. a)Neighbor receives the PSNP, already has A.00-01(Seq #100) in its LSPDB – no action taken. All is good. b)Neighbor receives the PSNP, does not have A.00-1(Seq #100) in its LSPDB – sends a PSNP back to the originator requesting that the LSP be flooded. At this point I assume normal flooding procedures apply i.e., SRM flag is set, causing the LSP to be flooded, and I assume SRM remains set until the LSP is acknowledged. All is good – but the additional flooding is likely to be redundant as the node which had the responsibility for sending this LSP to your neighbor should be doing so reliably. c)Neighbor does not receive the PSNP. If the neighbor does not have A.00-01(Seq #100) in its database, the one time sending of the special PSNP won’t trigger sending of the missing LSP. As the draft does not propose that the special PSNP be resent, I assume during the next time interval the only LSP entries that would be sent in the next special PSNP would be other LSPs that were partially flooded in the subsequent interval – not A.00-01. Periodic CSNPs can be dropped as well, but as periodic CSNPs are guaranteed to be sent continuously at some interval and they cover the entire LSPDB, reliability of the Update process is assured. Under some pathological conditions it might take a significant amount of time to converge, but it is assured. What then do these special PSNPs provide? It could be argued that they provide a lower cost and more targeted recovery mechanism in some circumstances – and that using them in conjunction with periodic CSNPs may speed convergence. However, I think the existing proposal discussed in Section 2.3 of the draft lacks detail and is unlikely to achieve this goal in most circumstances. The time period of 1 second is too aggressive. You may end up sending the special PSNP before the node which has the responsibility for flooding the LSP to your neighbor has even had a chance to do the flooding – which will undermine the benefits of the flooding reduction. If you consider the cost of sending/receiving a PSNP is roughly equivalent to the cost of sending/receiving an LSP, you will have created the equivalent of full mesh flooding every second since every node can expect to receive a PSNP from every neighbor whenever an LSP update is triggered. NOTE: The relative impact will be more noticeable when a small # of LSPs are updated. And since the node which is responsible for flooding to a particular neighbor should be doing so reliably, under most circumstances the special PSNP is not needed at all – so why choose an aggressive time interval for sending it? Periodic CSNPs are sufficient – are typically done at a slow rate (10s of seconds) – and apparently (from your response below) you seem to intend to send periodic CSNPs also (though the draft does not mention this). I am not seeing the benefit of the special PSNP – but if you are committed to this, please provide a more robust description of how they should be used in the draft and an analysis of the benefits under some realistic flooding scenarios. Les From: Tony Przygienda <[email protected]> Sent: Friday, November 25, 2022 1:06 AM To: Les Ginsberg (ginsberg) <[email protected]> Cc: [email protected]; [email protected] Subject: Re: [Lsr] Questions on draft-white-lsr-distoptflood Les, bits delay since I had to think a bits about your comment to do it justice and it's bit long'ish 1. So, to start with a cut and dry summary and reasoning for it, I am firmly against adding signaling to the whole thing by some means (or rather any procedures to act upon distribution of info about the algorithm used by any of the nodes involved, i.e. I'm ok with having the algorithm advertised solely for info purposes with me though I don't see what function it serves except detecting nodes that do not reduce yet in transition of a network or maybe, as you say, detect algorithm mismatch). More detailed reasoning follows: a. First reason is the fact that the additional flexibility of maybe having one day some better hash algorithm will add very serious amount of complexity in implementation/behavior in case we are talking about adding it to the centralized variant of the dynamic flooding draft and having a leader advertising the algorithm. i. backup machinery needs to be added/spec'ed properly. What does the network do if backup has different algorithm than the current leader? First we would have a transition phase, some nodes have old algorithm, some the old, network may stop converging for a bit that way, worst case we partition the PGL algorithm advertisement from new nodes so we have to wait CSNP * diameter etc. Big network bleep is the result. I know there is lots verbiage in the dynamic flooding draft but I know the reality of implementations of such things and they are extraordinarily high for the bit flexibility the whole thing would buy us I see you suggesting. ii. What happens if PGL doesn't say anything? Default algorithm? Full flooding again? in case of full-flooding-regression all of a sudden one fat finger on PGL (or PGL moving unexpectedly due to fat finger/some other node config changes) can basically crash your network and worst case stop convergence if reduction allowed before to converge but full flooding seriously slows down everything. I know, this would be a network tethering on the edge already but why have additional daemons hiding in a single point of failure on top. iii. lots of remaining subtle things. e.g. to make sure the whole thing works each node havs to compute reachability to the leader (not sure that's in the dynamic flooding draft now), otherwise they may use stable LSPs from a leader that is gone/partitioned. This reachability computation will have adverse effects. The timing is unpredictable in the network and may lead to problems mentioned in i). If nodes don't do the reachability we may end up in Paxos unintentionally BTW. Generally, I can claim that I lived the PGL in ATM so I've seen the "central leader in IGP" game. Not excited about it from experience and it was much easier in ATM already due to hard state of SVCs. To sum it up again, I see here a suggestion to add massive amount of complexity/fragility for an assumed, unspecified benefit in the future. As footnote: centralization in an IGP a cardinal sin in my eyes moving away from the first premise that made distributed routing so successful. I spoke against it and still hold the same opinion and if that's heresy I'm more than happy to be bumped off the author's list of the dynamic-flooding draft ;-). so maybe as iv) here: WHAT additional variables in the hash do you imagine would constitute a _better_ algorithm? AFAIS there are none I can imagine and the current algorithm provides pretty much best entropy with clearly cap'ed state per node needed to balance per LSP originator/fragment. So instead of "pledging for flexibility for flexibilitity's sake" I'd rather see you suggesting something that would change/improve the behavior in the future/now in concrete terms and then let's talk about specifics. b. Then, as second reason when talking towards a distributed solution, i.e. each node flooding the algorithm it uses. We still do NOT know what to do in case nodes will advertise different algorithms each, no matter it's advertised or not. Shut down the network, fall back to full flooding if one node disagrees (which makes every node a potential attack vector)? We had that kind of discussion before, last on multi-TLV where you were insisting on killing the cap indication so it would be funny to add it here. Complexity without any concrete benefit whatsoever AFAIS and lots of ratholes again. 2. To go to your reliable PSNP/CSNP objection now. First, they were never reliable. Neither were LSPs. We can make a very fine argument that if PSNPs/CSNPs are not reliable then ISIS will not converge at all. We can start to argue then how many we lose and when and how one variation of flooding is "more robust" than other and we can actually discover that if the redundancy factor in graph is higher than the largest fanout than we are in normal ISIS and hence the reduced flooding redundancy factor (in extreme case it's basically infinity for existent flooding algorithm in ISIS) + PSNP unreliability are two variables (plus network radius + origination rates + etc) which in extreme case can be shown to not converge the network anymore no matter the flooding (e.g. if the re-origination rate + radius is higher than the propagation time under CSNP/PSNP losses). In short, the objection brings nothing new to the table, Les, this has been around forever and we're talking here about the fact that any flooding reduction makes flooding "less" reliable somewhat. That's trivia. b. to more productive arguments: the solution does NOT reduce the full CSNP advertisement and this will fix any bug in an algorithm. We agree that far I think. 3. Then, let's have the up-to-date PSNP in glossary with a better name, e.g. "consistency assuring PSNP" or CA-PSNP which describes better what it is. It cannot hurt It goes like this (which I thought was already decently clear in the draft but nothing wrong in spelling that out) a) the algorithm figures out during computation that LSP-ID X/fragment Y is NOT flooded on since other RNL members took over. Now, the according LSP-ID X/fragment Y is put on PSNP queue of all the members in TN that are your neighbors (optimization here) or as the draft says "all your neighbors" which is bits too conservative. Flood out those PSNPs on a second timer unless they were killed during normal ISIS processing rules or already went out. Observe that NO changes are made to normal ISIS CSNP/LSP/PSNP processing here except dropping those PSNPs into the according queues to go out. If the neighbor gets the PSNP and interprets it as something newer, normal procedures kick in. If it already has it nothing will happen really per normal procedures. If your implementation is very conservative you can choose yourself super conservative constants, e.g. unless you see tripple coverage by other RNLs you flood nevertheless. Or if it turns out you send PSNPs to your neighbors in expectation that they covered the TNLs and you get requests back, either the other TNLs are dead slow or something is off and an alarm can be given as in "flooding reduction here struggles". Nothing to do with this solution, this will happen on any type of flood reduction, chokepoints may get created (and observe that this draft load balances flooding and not only reduces, one of the lessons I learned implementing those things in my earlier lives ;-) So, to sum up the argument chain, I err on the side of simplicity here since from experience, simplicity allows us to deploy and stand straight-faced in front of customers with very large, dense networks. This draft is something that consists of 12 pages including examples and about 4-5 pages boilerplate. And on top bases on old clean work and pretty much e'thing in it proven by implementation and previous art IME. This vs. an adopted design-by-comittee draft of 46 pages that at this point in time I think does not standardize any interoperability but standardizes how to find out why things don't interoperate due to all possible combinations of centralized vs. distributed plus bring your own algorithm on top by every vendor (based on my last read of it) ... -- tony On Wed, Nov 23, 2022 at 1:14 AM Les Ginsberg (ginsberg) <[email protected]<mailto:[email protected]>> wrote: Draft authors - The WG adoption call reminded me that I had some questions following the presentation of this draft at IETF 114 which we decided to "take to the list" - but we/I never did. Looking at the minutes, there was this exchange: <snip> Les: I'm not convinced that you don't need to advertise whether a node needs support this. If not, why not define this as an algorithm and use the dynamic flooding? Tony P: First bring me a case why we need to signal this. Les: If I'm not going to flood and I'm expecting someone else to flood, and I don't know whether we're in sync. Tony: Think it through, the mix with old nodes just fine. The old guy still do the full flooding and that's fine. Les: You use the term up-to-date PSNP, I have no idea how you determine whether the PSNP is "up-to-date"? unlike CSNP, PSNP doesn't have the info. Tony: You have to list all those things. Les: Let's take it to the list. <end snip> Question #1: Why not define this as an algorithm and use draft-ietf-lsr-dynamic-flooding (in distributed mode)? This question is of significance both from a correctness standpoint and what track (Informational or Standard) the draft should target. Tony P's reply above suggests this isn't needed - but I don't think this is true. The draft itself says in Section 2.1: <snip> Once this flooding group is determined, the members of the flooding group will each (independently) choose which of the members should re-flood the received information. Each member of the flooding group calculates this independently of all the other members, but a common hash MUST be used across a set of shared variables so each member of the group comes to the same conclusion. <end snip> If a "common hash MUST be used across a set of shared variables" (and I agree that it MUST) then all nodes which support the optimization MUST agree to use the same algorithm. Given that there are likely many hash algorithms which could be used, some way to signal the algorithm in use seems to be required. By publishing a given algorithm(including the hash) and having it assigned an identifier in the registry defined in https://www.ietf.org/archive/id/draft-ietf-lsr-dynamic-flooding-11.html#section-7.3 - and using the Area Leader logic defined in the same draft, consistency is achieved. Without that, I don't think this is guaranteed to work. Note the issue here has nothing to do with legacy nodes - I agree with Tony P's comment above that legacy nodes do not present a problem - they just limit the benefits. Question #2: Please define and demonstrate how "up-to-date PSNPs" work to recover from flooding failures. We know that periodic CSNPs robustly address this issue - and their use has been recommended for flooding reduction solutions over the years. Please more completely define "up-to-date PSNPs" and spend some time demonstrating how they are guaranteed to work - and consider in that discussion that transmission of SNPs of either type is not 100% reliable. Thanx. Les _______________________________________________ Lsr mailing list [email protected]<mailto:[email protected]> https://www.ietf.org/mailman/listinfo/lsr
_______________________________________________ Lsr mailing list [email protected] https://www.ietf.org/mailman/listinfo/lsr
