Tony – Let me try one example – see if it helps.
Summarization is used in the network. But customer identifies a modest number of key nodes where it wants to detect loss of reachability ASAP. Unfortunately, customer is unable to assign addresses which are outside of the summary to these nodes. Customer assigns admin tags to the prefixes of interest and asks the IGP vendor to support advertising reachability to the tagged prefixes in addition to the summary (even though they are covered by summary). Are we still within the IGP set of responsibilities IYO? Now, if the ABR so configured loses reachability to one of the tagged prefixes, what should it do? Clearly, it needs to stop advertising reachability for that prefix. But how can this be used to achieve what the customer desires i.e., fast reaction to the loss of reachability? One can imagine some feature that looks at history and tracks when the summary address was first advertised and when the tagged prefix covered by the summary was first advertised and try to deduce what this means if reachability to the tagged prefix is withdrawn - but this is problematic as it depends on history. What we propose is that if a customer wants to use summaries, they should feel free to do so. But if they want faster detection of loss of reachability to (some) destinations covered by the summary, there is a new advertisement which provides this which avoids the ambiguities mentioned above. Again, the IGP isn’t acquiring new information – it has always known this information – it just hasn’t had a way to advertise this in the presence of summaries. And, the use of tagging to identify the prefixes which may be advertised using the new mechanism is one way to deal with scale issues. ?? I also want to point out that we are NOT asking IGPs to detect/advertise loss of liveness. Just loss of reachability. Maybe the node associated with the prefix is down – maybe it is up but we no longer have a path to it. We are not asking the ABR to determine why it no longer has reachability. Les From: Tony Li <[email protected]> On Behalf Of Tony Li Sent: Monday, November 29, 2021 3:22 PM To: Les Ginsberg (ginsberg) <[email protected]> Cc: Hannes Gredler <[email protected]>; Aijun Wang <[email protected]>; Robert Raszuk <[email protected]>; lsr <[email protected]>; Tony Li <[email protected]>; Shraddha Hegde <[email protected]>; Peter Psenak (ppsenak) <[email protected]> Subject: Re: [Lsr] BGP vs PUA/PULSE Les, Thank you for clearly articulating your understanding. One more time, with feeling: [LES:] I am not convinced either side can claim "consensus" in this discussion. That is a work in progress. 😊 We concur on this point. :) However, when you say IGPs are (exclusively?) for topology discovery - it seems to suggest that IGP shouldn’t be advertising prefix reachability at all. Hopefully, that is not what you intend. My position is simple: IGPs provide topology discovery, reachability, and path computation. They do not provide ‘liveness’ and are not intended to. Trying to force an IGP to carry liveness information violates the architecture of the protocol. That was never the problem to be solved. Just because you have a prefix for 1.1.1/24 does NOT imply that 1.1.1.1 will accept your packets or even that there is any host within 1.1.1/24 that will, only that the prefix is supposed to be within the advertiser’s area. One of the points that still baffles me is the assertion of an architectural violation in the IGP proposals. It is OK for IGPs to advertise all prefixes covered by a summary (i.e., do not summarize). The point of summarization is to create scalability through abstraction. If a domain does NOT want abstraction, that’s perfectly ok. Don’t summarize. Don’t use areas. Run everything as a single flat area. However, if a domain chooses to summarize and then generates innumerable prefixes, you will forgo abstraction and scalability. The implicit implication is that this will somehow work, when it in fact will not. Further, in the proposals that we’ve seen, the end users will not know about this until the worst possible time: a mass failure. Engineering in a catastrophic failure mode into the protocol violates the architecture and is not acceptable. It is OK for IGPs to advertise multiple summaries (e.g., multiple /24s instead of a single /16). It is even OK for IGPs to advertise some prefixes covered by a summary along with the summary (don’t know if any implementations do this - but they could). None of this is an "architectural violation". Hopefully, these violations of abstraction are carefully considered manual exceptions that will not explode in the end user’s face at the worst possible time. But advertising a summary and signaling the loss of reachability to a specific prefix covered by the summary is seen by some as an architectural violation. Sorry, I still don't understand this argument. First off, the point is not the loss of reachability. It’s the loss of liveness. This is key. We’re suddenly changing the roles and responsibilities of the IGP. And we’re sacrificing scalability at the same time. You can not like the approach. You can be concerned about scaling properties (more on that below). You can question the effectiveness of ephemeral advertisements. These kinds of objections/concerns I can easily understand - even if we don’t agree on their significance. But claiming that "IGPs are not supposed to do this"?? Not grokking this. What is it that the IGPs are supposed to do? As mentioned, liveness was not one of those things. Ever. A node goes down in an area and now we need to signal this outside of the area? That’s a major scope creep. We have not added any new information to the IGP itself. We are only suggesting a new form of advertisement to signal some information already known to the IGP, but which is currently not advertised (in some deployments) by the configuration of summaries. More specific prefixes (specifically host routes) outside of their area is certainly new information. [LES:] The questions of scale (as I have previously commented) are very legitimate - and more has to be specified before an IGP solution would be considered ready for deployment. But there are tools easily applicable to address this (rate limiting, embedded summarization, perhaps others). All of the tools that I have seen would seem to break the intended functionality if invoked. Pretty clearly, they are not a robust solution. The more significant point is to focus on the goal - which in this usage is improved convergence time. When the network is largely stable, convergence improvements can be achieved w/o risk. When widespread failures occur, real time signaling of any type is unlikely to provide improved convergence - which is why the IGPs today shift the focus from convergence to stability by slowing down the rate of updates sent and SPFs performed. This is STILL true even in the fast convergence/FRR era. I see no reason why the same tools should not be used in this case. There is no issue with stable networks, but that’s not the issue. Our concern is to provide a stable service despite the network having a widespread failure. The tools that are proposed would limit either the number or rate of the negative advertisements. The result is that there would NOT be rapid convergence. That’s to say that the proposed solutions really don’t do what they claim to do. Why would we intentionally defraud our users? Tony
_______________________________________________ Lsr mailing list [email protected] https://www.ietf.org/mailman/listinfo/lsr
