Re: [Lsr] BGP vs PUA/PULSE

Peter Psenak Mon, 24 Jan 2022 04:35:33 -0800

Chris,

On 24/01/2022 10:29, Christian Hopps wrote:


"Aijun Wang" <[email protected]> writes:

Hi, Chris:
We should notice here that it is not "a prefix", it's possible for "all node's 
loopback address, or even some link's address".
Gyan's reference for RFC5302 state clearly the disadvantage of
non-summarization, and the operators have followed this approach also about 20
years, then you just propose to divert to another direction?


For 20 years we haven't needed PUA/PULSE, now your saying we do, so I'm saying 
don't use summarization *for these special prefixes it suddenly doesn't work 
for*.

I have *never* said do not use summarization. I've have tried very hard to say very 
clearly "for those special prefixes" every time I have responded to this 
thread. It's very frustrating.

I'm saying do not summarize these "super important prefixes" -- these prefixes you 
want to modify the summarization process because summarization doens't work for them"

Again KISS applies here:

       If the summarization process *doesn't work* for a given prefix P, then 
*don't use summarization* for prefix P!



above simply does not work.

1. so far nobody summarizes and it all works. True, reason being thatthe number of PEs in the network is typically below 10k. Alsosummarization with the MPLS data plane is problematic.

We are getting requests to design a next-gen networks that will include100k PEs. The summarization is essential in such scale.

2. We are only talking about PE addresses here, not the infrastructurelinks obviously - those are filtered out using other techniques. All thePE addresses are equally important, it's not possible to make only someof them important, while others not.





thanks,
Peter


Thanks,
Chris.
[As wg member]

Best Regards

Aijun Wang
China Telecom

-----Original Message-----
From: Christian Hopps <[email protected]>
Sent: Monday, January 24, 2022 1:50 PM
To: Gyan Mishra <[email protected]>
Cc: Christian Hopps <[email protected]>; Aijun Wang <[email protected]>;
Hannes Gredler <[email protected]>; John E Drake <[email protected]>; Les
Ginsberg (ginsberg) <[email protected]>; Peter Psenak (ppsenak)
<[email protected]>; Robert Raszuk <[email protected]>; Shraddha Hegde
<[email protected]>; Tony Li <[email protected]>; lsr <[email protected]>
Subject: Re: [Lsr] BGP vs PUA/PULSE


Ok, I guess I'll repeat what I said, as I don't believe anything new was 
presented here.

     Yes, having worked intimately with these IGPs for > 20 years now,
     I understand the use and the implications of using summary
     routes. :)

     My opinion remains unchanged.

"If a prefix is important enough to consider seriously hacking the routing
protocol to signal the prefix being unreachable, then that prefix is important
enough to not summarize to begin with." IOW; KISS

I'd prefer to not keep repeating this when presented with the same arguments, 
so please take any silence on my part as my opinion being unchanged.

Thanks,
Chris.
[As WG member]



Gyan Mishra <[email protected]> writes:

Hi Chris


Just about every vendor out there recommended best practice is to
layout address plan to take advantage of summarization wherever
possible and that as well includes PE loopback next hop attribute to
limit the router load as well as size of LSDB in the backbone as well
as domain wide.

I think you would be hard pressed to find any vendor that would say go
ahead and flood loopbacks domain wide and don’t summarize.

In large domains flooding domain wide is not feasible and
summarization is requirement even for the critical loopback BGP next
hops for most operators.

RFC 5302 talks about the ramifications of flooding in ISIS domain in
section 1.2 excerpt below:


1.2.  Scalability

    The disadvantage to performing the domain-wide prefix distribution
    described above is that it has an impact on the scalability of IS-IS.
    Areas within IS-IS help scalability in that LSPs are contained within
    a single area.  This limits the size of the link state database,
    which in turn limits the complexity of the shortest path computation.

    Further, the summarization of the prefix information aids scalability
    in that the abstraction of the prefix information removes the sheer
    number of data items to be transported and the number of routes to be
    computed.

    It should be noted quite strongly that the distribution of prefixes
    on a domain-wide basis impacts the scalability of IS-IS in the second
    respect.  It will increase the number of prefixes throughout the
    domain.  This will result in increased memory consumption,
    transmission requirements, and computation requirements throughout
    the domain.

    It must also be noted that the domain-wide distribution of prefixes
    has no effect whatsoever on the first aspect of scalability, namely
    the existence of areas and the limitation of the distribution of the
    link state database.




Gyan
On Fri, Jan 14, 2022 at 9:07 PM Christian Hopps <[email protected]>
wrote:

     Yes, having worked intimately with these IGPs for > 20 years now,
     I understand the use and the implications of using summary
     routes. :)

     My opinion remains unchanged.

     Thanks,
     Chris.
     [as wg member]

     > On Jan 14, 2022, at 8:50 PM, Aijun Wang <
     [email protected]> wrote:
     >
     > Hi, Christian:
     >
     > We should consider the balance and efficiency for the summary
     or not summary.
     > If you don’t summary, then all the areas will be filled with
     the specified detail routes(all PE’s loopback, may also include
     all P’s loopback). This can certainly increase the burden of the
     routers.
     >
     > But with summary, all these specific routes need not exist in
     the routing table. The nodes within the IGP need only be notified
     when one node is failure to accelerate the switchover of the
     overlay service.
     > And, you can also select to not using such mechanism, then the
     service will be backhole for some time until the service/
     application find this abnormal phenomenon.
     > PUA/PULSE are just the mechanism to reduce the abnormal
     durations, it is one kind of FRR technique.
     >
     > Aijun Wang
     > China Telecom
     >
     >> On Jan 15, 2022, at 09:26, Christian Hopps <[email protected]>
     wrote:
     >>
     >>
     >>
     >>> On Jan 14, 2022, at 8:25 PM, Christian Hopps <
     [email protected]> wrote:
     >>>
     >>> I understand the proposal. As I've stated elsewhere, I do not
     believe there is a problem here that needs solving. The "problem"
     was created by the user by summarizing prefixes that should not
     have been summarized -- they mis-configured their network. The
     routing protocols works just fine (act very quickly) if you don't
     incorrectly summarize "really important prefixes".
     >>>
     >>> I was simply pointing out that IGPs also don't deal in
     liveness since that keeps coming up.
     >>
     >> Sorry that was "as wg member".
     >>
     >>>
     >>> Thanks,
     >>> Chris.
     >>>
     >>>>> On Jan 14, 2022, at 8:06 PM, Aijun Wang <
     [email protected]> wrote:
     >>>>
     >>>> Hi, Christian and John:
     >>>>
     >>>> No. I think you all may misunderstand the proposal. What we
     are detecting is actually the reachability/liveness of node that
     connected to the application, not the application itself.
     >>>> And, I think the node liveness is same as the node
     reachability. They will all influence or break the path to their
     connected service if their forwarding function is failed.
     >>>>
     >>>> Aijun Wang
     >>>> China Telecom
     >>>>
     >>>>> On Jan 15, 2022, at 08:56, Christian Hopps <
     [email protected]> wrote:
     >>>>>
     >>>>> Indeed, and in fact the IGP should only be dealing with the
     reachability to the node, not with the node or applications
     liveness.
     >>>>>
     >>>>> Thanks,
     >>>>> Chris.
     >>>>> [as wg member]
     >>>>>
     >>>>>> On Jan 14, 2022, at 7:47 PM, John E Drake <
     [email protected]> wrote:
     >>>>>>
     >>>>>> I don’t think so.  Today things just work, at a given time
     scale.  What you said you are trying to do is reduce the time
     scale for detecting that an application on a node has failed.
     However, conflating the health of a node with the health of an
     application on that node seems to be inherently flawed.
     >>>>>>
     >>>>>> Yours Irrespectively,
     >>>>>>
     >>>>>> John
     >>>>>>
     >>>>>>
     >>>>>> Juniper Business Use Only
     >>>>>> From: Aijun Wang <[email protected]>
     >>>>>> Sent: Friday, January 14, 2022 7:40 PM
     >>>>>> To: John E Drake <[email protected]>
     >>>>>> Cc: Les Ginsberg (ginsberg) <[email protected]>; Robert
     Raszuk <[email protected]>; Christian Hopps <[email protected]>;
     Shraddha Hegde <[email protected]>; Tony Li <[email protected]>;
     Hannes Gredler <[email protected]>; lsr <[email protected]>; Peter
     Psenak (ppsenak) <[email protected]>
     >>>>>> Subject: Re: [Lsr] BGP vs PUA/PULSE
     >>>>>>
     >>>>>> [External Email. Be cautious of content]
     >>>>>>
     >>>>>> When the node is up, all the following process are passed
     to the application layer. This is the normal procedures of the
     IGP should do.
     >>>>>> According to your logic, IGP are solving the wrong problem
     now?
     >>>>>>
     >>>>>> Aijun Wang
     >>>>>> China Telecom
     >>>>>>
     >>>>>>
     >>>>>> On Jan 15, 2022, at 08:30, John E Drake <jdrake=
     [email protected]> wrote:
     >>>>>>
     >>>>>>
     >>>>>> Correct, but as Tony, Robert and I have noted, a node
     being up does not mean that an application on that node is up,
     which means that your proposed solution is probably a solution to
     the wrong problem.  Further, Robert’s solution is probably a
     solution to the right problem.
     >>>>>>
     >>>>>> Yours Irrespectively,
     >>>>>>
     >>>>>> John
     >>>>>>
     >>>>>>
     >>>>>> Juniper Business Use Only
     >>>>>> From: Aijun Wang <[email protected]>
     >>>>>> Sent: Friday, January 14, 2022 5:53 PM
     >>>>>> To: John E Drake <[email protected]>
     >>>>>> Cc: Robert Raszuk <[email protected]>; Les Ginsberg
     (ginsberg) <[email protected]>; Christian Hopps <
     [email protected]>; Shraddha Hegde <[email protected]>; Tony
     Li <[email protected]>; Hannes Gredler <[email protected]>; lsr <
     [email protected]>; Peter Psenak (ppsenak) <[email protected]>
     >>>>>> Subject: Re: [Lsr] BGP vs PUA/PULSE
     >>>>>>
     >>>>>> [External Email. Be cautious of content]
     >>>>>>
     >>>>>> Hi, John:
     >>>>>> Please note if the node is down, the service will not be
     accessed.
     >>>>>> We are discussing the “DOWN” notification, not the “UP”
     notification.
     >>>>>>
     >>>>>> Aijun Wang
     >>>>>> China Telecom
     >>>>>>
     >>>>>>
     >>>>>> On Jan 15, 2022, at 00:25, John E Drake <jdrake=
     [email protected]> wrote:
     >>>>>>
     >>>>>>
     >>>>>> Hi,
     >>>>>>
     >>>>>> Comment inline below.
     >>>>>>
     >>>>>> Yours Irrespectively,
     >>>>>>
     >>>>>> John
     >>>>>>
     >>>>>>
     >>>>>> Juniper Business Use Only
     >>>>>> From: Lsr <[email protected]> On Behalf Of Robert
     Raszuk
     >>>>>> Sent: Monday, January 10, 2022 7:15 PM
     >>>>>> To: Les Ginsberg (ginsberg) <[email protected]>
     >>>>>> Cc: Christian Hopps <[email protected]>; Aijun Wang <
     [email protected]>; Shraddha Hegde <[email protected]
     >; Tony Li <[email protected]>; Hannes Gredler <[email protected]>;
     lsr <[email protected]>; Peter Psenak (ppsenak) <[email protected]>
     >>>>>> Subject: Re: [Lsr] BGP vs PUA/PULSE
     >>>>>>
     >>>>>> [External Email. Be cautious of content]
     >>>>>>
     >>>>>> Hi Les,
     >>>>>>
     >>>>>>> You seem focused on the notification delivery mechanism
     only.
     >>>>>>
     >>>>>> Not really. For me, an advertised summary is like a prefix
     when you are dialing a country code. Call signaling knows to go
     north if you are calling a crab shop in Alaska.
     >>>>>>
     >>>>>> Now such direction does not indicate if the shop is open
     or has crabs.
     >>>>>>
     >>>>>> That info you need to get over the top as a service. So I
     am much more in favor to make the service to tell you directly or
     indirectly that it is available.
     >>>>>>
     >>>>>> [JD]  Right.  Just because a node is up and connected to
     the network does not imply that a given application is active on
     it.
     >>>>>>
     >>>>>> Best,
     >>>>>> R.
     >>>>>>
     >>>>>>
     >>>>>>
     >>>>>>
     >>>>>>
     >>>>>> On Tue, Jan 11, 2022 at 1:07 AM Les Ginsberg (ginsberg) <
     [email protected]> wrote:
     >>>>>> Robert -
     >>>>>>
     >>>>>> From: Robert Raszuk <[email protected]>
     >>>>>> Sent: Monday, January 10, 2022 2:56 PM
     >>>>>> To: Les Ginsberg (ginsberg) <[email protected]>
     >>>>>> Cc: Tony Li <[email protected]>; Christian Hopps <
     [email protected]>; Peter Psenak (ppsenak) <[email protected]>;
     Shraddha Hegde <[email protected]>; Aijun Wang <
     [email protected]>; Hannes Gredler <[email protected]>;
     lsr <[email protected]>
     >>>>>> Subject: Re: [Lsr] BGP vs PUA/PULSE
     >>>>>>
     >>>>>> Les,
     >>>>>>
     >>>>>> We have received requests from real customers who both
     need to summarize AND would like better response time to loss of
     reachability to individual nodes.
     >>>>>>
     >>>>>> We all agree the request is legitimate.
     >>>>>>
     >>>>>> [LES:] It does not seem to me that everyone does agree on
     that – but I appreciate that you agree.
     >>>>>>
     >>>>>> But do they realize that to practically employ what you
     are proposing (new PDU flooding) requires 100% software upgrade
     to all IGP nodes in the entire network ? Do they also realize
     that to effectively use it requires data plane change (sure
     software but data plane code is not as simple as PI) on all
     ingress PEs ?
     >>>>>>
     >>>>>> [LES:] As far as forwarding, as Peter has indicated, we
     have a POC and it works fine. And there are many possible ways
     for implementations to go.
     >>>>>> It may or may not require 100% software upgrade – but I
     agree a significant number of nodes have to be upgraded to at
     least support pulse flooding.
     >>>>>>
     >>>>>>
     >>>>>> And with scale requirements you are describing it seems
     this would be 1000s of nodes (if not more). That's massive if
     compared to alternative approaches to achieve the same or even
     better results.
     >>>>>>
     >>>>>> [LES:] Be happy to review other solutions if/when someone
     writes them up.
     >>>>>> I think what is overlooked in the discussion of other
     solutions is that reachability info is provided by the IGP. If
     all the IGP advertises is a summary then how would individual
     loss of reachability become known at scale?
     >>>>>> You seem focused on the notification delivery mechanism
     only.
     >>>>>>
     >>>>>> Les
     >>>>>>
     >>>>>> Many thx,
     >>>>>> Robert
     >>>>>>
     >>>>>> _______________________________________________
     >>>>>> Lsr mailing list
     >>>>>> [email protected]
     >>>>>> https://www.ietf.org/mailman/listinfo/lsr
     >>>>>
     >>>>
     >>>
     >>
     >

     _______________________________________________
     Lsr mailing list
     [email protected]
     https://www.ietf.org/mailman/listinfo/lsr


_______________________________________________
Lsr mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/lsr

Re: [Lsr] BGP vs PUA/PULSE

Reply via email to