Re: [Lsr] BGP vs PUA/PULSE

Les Ginsberg (ginsberg) Mon, 29 Nov 2021 16:39:29 -0800

Tony –

Let me try one example – see if it helps.


Summarization is used in the network.
But customer identifies a modest number of key nodes where it wants to detect 
loss of reachability ASAP. Unfortunately, customer is unable to assign 
addresses which are outside of the summary to these nodes.
Customer assigns admin tags to the prefixes of interest and asks the IGP vendor 
to support advertising reachability to the tagged prefixes in addition to the 
summary (even though they are covered by summary).
Are we still within the IGP set of responsibilities IYO?

Now, if the ABR so configured loses reachability to one of the tagged prefixes, 
what should it do?
Clearly, it needs to stop advertising reachability for that prefix. But how can 
this be used to achieve what the customer desires i.e., fast reaction to the 
loss of reachability?
One can imagine some feature that looks at history and tracks when the summary 
address was first advertised and when the tagged prefix covered by the summary 
was first advertised and try to deduce what this means if reachability to the 
tagged prefix is withdrawn  - but this is problematic as it depends on history.

What we propose is that if a customer wants to use summaries, they should feel 
free to do so. But if they want faster detection of loss of reachability to 
(some) destinations covered by the summary, there is a new advertisement which 
provides this which avoids the ambiguities mentioned above.
Again, the IGP isn’t acquiring new information – it has always known this 
information – it just hasn’t had a way to advertise this in the presence of 
summaries.
And, the use of tagging to identify the prefixes which may be advertised using 
the new mechanism is one way to deal with scale issues.

??

I also want to point out that we are NOT asking IGPs to detect/advertise loss 
of liveness. Just loss of reachability. Maybe the node associated with the 
prefix is down – maybe it is up but we no longer have a path to it. We are not 
asking the ABR to determine why it no longer has reachability.

   Les


From: Tony Li <[email protected]> On Behalf Of Tony Li
Sent: Monday, November 29, 2021 3:22 PM
To: Les Ginsberg (ginsberg) <[email protected]>
Cc: Hannes Gredler <[email protected]>; Aijun Wang <[email protected]>; 
Robert Raszuk <[email protected]>; lsr <[email protected]>; Tony Li 
<[email protected]>; Shraddha Hegde <[email protected]>; Peter Psenak 
(ppsenak) <[email protected]>
Subject: Re: [Lsr] BGP vs PUA/PULSE


Les,

Thank you for clearly articulating your understanding.  One more time, with 
feeling:


[LES:] I am not convinced either side can claim "consensus" in this discussion. 
That is a work in progress. 😊


We concur on this point. :)



However, when you say IGPs are (exclusively?) for topology discovery - it seems 
to suggest that IGP shouldn’t be advertising prefix reachability at all. 
Hopefully, that is not what you intend.


My position is simple: IGPs provide topology discovery, reachability, and path 
computation. They do not provide ‘liveness’ and are not intended to. Trying to 
force an IGP to carry liveness information violates the architecture of the 
protocol. That was never the problem to be solved. Just because you have a 
prefix for 1.1.1/24 does NOT imply that 1.1.1.1 will accept your packets or 
even that there is any host within 1.1.1/24 that will, only that the prefix is 
supposed to be within the advertiser’s area.


One of the points that still baffles me is the assertion of an architectural 
violation in the IGP proposals.

It is OK for IGPs to advertise all prefixes covered by a summary (i.e., do not 
summarize).


The point of summarization is to create scalability through abstraction. If a 
domain does NOT want abstraction, that’s perfectly ok. Don’t summarize. Don’t 
use areas. Run everything as a single flat area.

However, if a domain chooses to summarize and then generates innumerable 
prefixes, you will forgo abstraction and scalability. The implicit implication 
is that this will somehow work, when it in fact will not. Further, in the 
proposals that we’ve seen, the end users will not know about this until the 
worst possible time: a mass failure. Engineering in a catastrophic failure mode 
into the protocol violates the architecture and is not acceptable.



It is OK for IGPs to advertise multiple summaries (e.g., multiple /24s instead 
of a single /16).
It is even OK for IGPs to advertise some prefixes covered by a summary along 
with the summary (don’t know if any implementations do this - but they could).
None of this is an "architectural violation".


Hopefully, these violations of abstraction are carefully considered manual 
exceptions that will not explode in the end user’s face at the worst possible 
time.


But advertising a summary and signaling the loss of reachability to a specific 
prefix covered by the summary is seen by some as an architectural violation.
Sorry, I still don't understand this argument.


First off, the point is not the loss of reachability. It’s the loss of 
liveness.  This is key. We’re suddenly changing the roles and responsibilities 
of the IGP. And we’re sacrificing scalability at the same time.


You can not like the approach. You can be concerned about scaling properties 
(more on that below). You can question the effectiveness of ephemeral 
advertisements.
These kinds of objections/concerns I can easily understand - even if we don’t 
agree on their significance.
But claiming that "IGPs are not supposed to do this"??
Not grokking this.


What is it that the IGPs are supposed to do? As mentioned, liveness was not one 
of those things. Ever.  A node goes down in an area and now we need to signal 
this outside of the area? That’s a major scope creep.


We have not added any new information to the IGP itself. We are only suggesting 
a new form of advertisement to signal some information already known to the 
IGP, but which is currently not advertised (in some deployments) by the 
configuration of summaries.


More specific prefixes (specifically host routes) outside of their area is 
certainly new information.


[LES:] The questions of scale (as I have previously commented) are very 
legitimate - and more has to be specified before an IGP solution would be 
considered ready for deployment. But there are tools easily applicable to 
address this (rate limiting, embedded summarization, perhaps others).


All of the tools that I have seen would seem to break the intended 
functionality if invoked. Pretty clearly, they are not a robust solution.



The more significant point is to focus on the goal - which in this usage is 
improved convergence time.
When the network is largely stable, convergence improvements can be achieved 
w/o risk.
When widespread failures occur, real time signaling of any type is unlikely to 
provide improved convergence - which is why the IGPs today shift the focus from 
convergence to stability by slowing down the rate of updates sent and SPFs 
performed. This is STILL true even in the fast convergence/FRR era.
I see no reason why the same tools should not be used in this case.


There is no issue with stable networks, but that’s not the issue. Our concern 
is to provide a stable service despite the network having a widespread failure.
The tools that are proposed would limit either the number or rate of the 
negative advertisements. The result is that there would NOT be rapid 
convergence. That’s to say that the proposed solutions really don’t do what 
they claim to do. Why would we intentionally defraud our users?

Tony

_______________________________________________
Lsr mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/lsr

Re: [Lsr] BGP vs PUA/PULSE

Reply via email to