Re: [Lsr] BGP vs PUA/PULSE

Peter Psenak Mon, 24 Jan 2022 09:42:53 -0800

On 24/01/2022 16:19, Christian Hopps wrote:


Peter Psenak <[email protected]> writes:

Chris,

On 24/01/2022 10:29, Christian Hopps wrote:

Again KISS applies here:
        If the summarization process *doesn't work* for a given prefix P, then
*don't use summarization* for prefix P!


above simply does not work.

1. so far nobody summarizes and it all works. True, reason being that the number
of PEs in the network is typically below 10k. Also summarization with the MPLS
data plane is problematic.

We are getting requests to design a next-gen networks that will include 100k
PEs. The summarization is essential in such scale.


So pick a better design -- seriously.

It seems its time to think "outside the single IGP box" to handle the PE to PE 
functionality when you start contemplating 100k PEs. Especially when every PE probably 
doesn't need full-mesh hyper-adaptive knowledge of every other PE for doing their jobs.

- multiple IGPs do not solve the problem. You can stick BGP betweenthem, but it brings its own issues. What else do you have?

- the point is we can scale IGPs to these numners with summarizationeasily. With summarization you have a perfect topology isolation anddecent convergence if we solve the problem in hand.

- nobody claims every PE needs to talk to every PE. But any PE in anyarea may need to talk to some PEs from other areas.


In any case the inelegance of the proposed changes to the routing protocol are 
a giant flashing red warning light that the chosen design is not the right one.


I don't understand the basis of the above statement.

thanks,
Peter


Thanks,
Chris.
[as wg member]


2. We are only talking about PE addresses here, not the infrastructure links
obviously - those are filtered out using other techniques. All the PE addresses
are equally important, it's not possible to make only some of them important,
while others not.




thanks,
Peter

Thanks,
Chris.
[As wg member]

Best Regards

Aijun Wang
China Telecom

-----Original Message-----
From: Christian Hopps <[email protected]>
Sent: Monday, January 24, 2022 1:50 PM
To: Gyan Mishra <[email protected]>
Cc: Christian Hopps <[email protected]>; Aijun Wang <[email protected]>;
Hannes Gredler <[email protected]>; John E Drake <[email protected]>; Les
Ginsberg (ginsberg) <[email protected]>; Peter Psenak (ppsenak)
<[email protected]>; Robert Raszuk <[email protected]>; Shraddha Hegde
<[email protected]>; Tony Li <[email protected]>; lsr <[email protected]>
Subject: Re: [Lsr] BGP vs PUA/PULSE


Ok, I guess I'll repeat what I said, as I don't believe anything new was 
presented here.

      Yes, having worked intimately with these IGPs for > 20 years now,
      I understand the use and the implications of using summary
      routes. :)

      My opinion remains unchanged.

"If a prefix is important enough to consider seriously hacking the routing
protocol to signal the prefix being unreachable, then that prefix is important
enough to not summarize to begin with." IOW; KISS

I'd prefer to not keep repeating this when presented with the same arguments, 
so please take any silence on my part as my opinion being unchanged.

Thanks,
Chris.
[As WG member]



Gyan Mishra <[email protected]> writes:

Hi Chris


Just about every vendor out there recommended best practice is to
layout address plan to take advantage of summarization wherever
possible and that as well includes PE loopback next hop attribute to
limit the router load as well as size of LSDB in the backbone as well
as domain wide.

I think you would be hard pressed to find any vendor that would say go
ahead and flood loopbacks domain wide and don’t summarize.

In large domains flooding domain wide is not feasible and
summarization is requirement even for the critical loopback BGP next
hops for most operators.

RFC 5302 talks about the ramifications of flooding in ISIS domain in
section 1.2 excerpt below:


1.2.  Scalability

     The disadvantage to performing the domain-wide prefix distribution
     described above is that it has an impact on the scalability of IS-IS.
     Areas within IS-IS help scalability in that LSPs are contained within
     a single area.  This limits the size of the link state database,
     which in turn limits the complexity of the shortest path computation.

     Further, the summarization of the prefix information aids scalability
     in that the abstraction of the prefix information removes the sheer
     number of data items to be transported and the number of routes to be
     computed.

     It should be noted quite strongly that the distribution of prefixes
     on a domain-wide basis impacts the scalability of IS-IS in the second
     respect.  It will increase the number of prefixes throughout the
     domain.  This will result in increased memory consumption,
     transmission requirements, and computation requirements throughout
     the domain.

     It must also be noted that the domain-wide distribution of prefixes
     has no effect whatsoever on the first aspect of scalability, namely
     the existence of areas and the limitation of the distribution of the
     link state database.




Gyan
On Fri, Jan 14, 2022 at 9:07 PM Christian Hopps <[email protected]>
wrote:

      Yes, having worked intimately with these IGPs for > 20 years now,
      I understand the use and the implications of using summary
      routes. :)

      My opinion remains unchanged.

      Thanks,
      Chris.
      [as wg member]

      > On Jan 14, 2022, at 8:50 PM, Aijun Wang <
      [email protected]> wrote:
      >
      > Hi, Christian:
      >
      > We should consider the balance and efficiency for the summary
      or not summary.
      > If you don’t summary, then all the areas will be filled with
      the specified detail routes(all PE’s loopback, may also include
      all P’s loopback). This can certainly increase the burden of the
      routers.
      >
      > But with summary, all these specific routes need not exist in
      the routing table. The nodes within the IGP need only be notified
      when one node is failure to accelerate the switchover of the
      overlay service.
      > And, you can also select to not using such mechanism, then the
      service will be backhole for some time until the service/
      application find this abnormal phenomenon.
      > PUA/PULSE are just the mechanism to reduce the abnormal
      durations, it is one kind of FRR technique.
      >
      > Aijun Wang
      > China Telecom
      >
      >> On Jan 15, 2022, at 09:26, Christian Hopps <[email protected]>
      wrote:
      >>
      >>
      >>
      >>> On Jan 14, 2022, at 8:25 PM, Christian Hopps <
      [email protected]> wrote:
      >>>
      >>> I understand the proposal. As I've stated elsewhere, I do not
      believe there is a problem here that needs solving. The "problem"
      was created by the user by summarizing prefixes that should not
      have been summarized -- they mis-configured their network. The
      routing protocols works just fine (act very quickly) if you don't
      incorrectly summarize "really important prefixes".
      >>>
      >>> I was simply pointing out that IGPs also don't deal in
      liveness since that keeps coming up.
      >>
      >> Sorry that was "as wg member".
      >>
      >>>
      >>> Thanks,
      >>> Chris.
      >>>
      >>>>> On Jan 14, 2022, at 8:06 PM, Aijun Wang <
      [email protected]> wrote:
      >>>>
      >>>> Hi, Christian and John:
      >>>>
      >>>> No. I think you all may misunderstand the proposal. What we
      are detecting is actually the reachability/liveness of node that
      connected to the application, not the application itself.
      >>>> And, I think the node liveness is same as the node
      reachability. They will all influence or break the path to their
      connected service if their forwarding function is failed.
      >>>>
      >>>> Aijun Wang
      >>>> China Telecom
      >>>>
      >>>>> On Jan 15, 2022, at 08:56, Christian Hopps <
      [email protected]> wrote:
      >>>>>
      >>>>> Indeed, and in fact the IGP should only be dealing with the
      reachability to the node, not with the node or applications
      liveness.
      >>>>>
      >>>>> Thanks,
      >>>>> Chris.
      >>>>> [as wg member]
      >>>>>
      >>>>>> On Jan 14, 2022, at 7:47 PM, John E Drake <
      [email protected]> wrote:
      >>>>>>
      >>>>>> I don’t think so.  Today things just work, at a given time
      scale.  What you said you are trying to do is reduce the time
      scale for detecting that an application on a node has failed.
      However, conflating the health of a node with the health of an
      application on that node seems to be inherently flawed.
      >>>>>>
      >>>>>> Yours Irrespectively,
      >>>>>>
      >>>>>> John
      >>>>>>
      >>>>>>
      >>>>>> Juniper Business Use Only
      >>>>>> From: Aijun Wang <[email protected]>
      >>>>>> Sent: Friday, January 14, 2022 7:40 PM
      >>>>>> To: John E Drake <[email protected]>
      >>>>>> Cc: Les Ginsberg (ginsberg) <[email protected]>; Robert
      Raszuk <[email protected]>; Christian Hopps <[email protected]>;
      Shraddha Hegde <[email protected]>; Tony Li <[email protected]>;
      Hannes Gredler <[email protected]>; lsr <[email protected]>; Peter
      Psenak (ppsenak) <[email protected]>
      >>>>>> Subject: Re: [Lsr] BGP vs PUA/PULSE
      >>>>>>
      >>>>>> [External Email. Be cautious of content]
      >>>>>>
      >>>>>> When the node is up, all the following process are passed
      to the application layer. This is the normal procedures of the
      IGP should do.
      >>>>>> According to your logic, IGP are solving the wrong problem
      now?
      >>>>>>
      >>>>>> Aijun Wang
      >>>>>> China Telecom
      >>>>>>
      >>>>>>
      >>>>>> On Jan 15, 2022, at 08:30, John E Drake <jdrake=
      [email protected]> wrote:
      >>>>>>
      >>>>>>
      >>>>>> Correct, but as Tony, Robert and I have noted, a node
      being up does not mean that an application on that node is up,
      which means that your proposed solution is probably a solution to
      the wrong problem.  Further, Robert’s solution is probably a
      solution to the right problem.
      >>>>>>
      >>>>>> Yours Irrespectively,
      >>>>>>
      >>>>>> John
      >>>>>>
      >>>>>>
      >>>>>> Juniper Business Use Only
      >>>>>> From: Aijun Wang <[email protected]>
      >>>>>> Sent: Friday, January 14, 2022 5:53 PM
      >>>>>> To: John E Drake <[email protected]>
      >>>>>> Cc: Robert Raszuk <[email protected]>; Les Ginsberg
      (ginsberg) <[email protected]>; Christian Hopps <
      [email protected]>; Shraddha Hegde <[email protected]>; Tony
      Li <[email protected]>; Hannes Gredler <[email protected]>; lsr <
      [email protected]>; Peter Psenak (ppsenak) <[email protected]>
      >>>>>> Subject: Re: [Lsr] BGP vs PUA/PULSE
      >>>>>>
      >>>>>> [External Email. Be cautious of content]
      >>>>>>
      >>>>>> Hi, John:
      >>>>>> Please note if the node is down, the service will not be
      accessed.
      >>>>>> We are discussing the “DOWN” notification, not the “UP”
      notification.
      >>>>>>
      >>>>>> Aijun Wang
      >>>>>> China Telecom
      >>>>>>
      >>>>>>
      >>>>>> On Jan 15, 2022, at 00:25, John E Drake <jdrake=
      [email protected]> wrote:
      >>>>>>
      >>>>>>
      >>>>>> Hi,
      >>>>>>
      >>>>>> Comment inline below.
      >>>>>>
      >>>>>> Yours Irrespectively,
      >>>>>>
      >>>>>> John
      >>>>>>
      >>>>>>
      >>>>>> Juniper Business Use Only
      >>>>>> From: Lsr <[email protected]> On Behalf Of Robert
      Raszuk
      >>>>>> Sent: Monday, January 10, 2022 7:15 PM
      >>>>>> To: Les Ginsberg (ginsberg) <[email protected]>
      >>>>>> Cc: Christian Hopps <[email protected]>; Aijun Wang <
      [email protected]>; Shraddha Hegde <[email protected]
      >; Tony Li <[email protected]>; Hannes Gredler <[email protected]>;
      lsr <[email protected]>; Peter Psenak (ppsenak) <[email protected]>
      >>>>>> Subject: Re: [Lsr] BGP vs PUA/PULSE
      >>>>>>
      >>>>>> [External Email. Be cautious of content]
      >>>>>>
      >>>>>> Hi Les,
      >>>>>>
      >>>>>>> You seem focused on the notification delivery mechanism
      only.
      >>>>>>
      >>>>>> Not really. For me, an advertised summary is like a prefix
      when you are dialing a country code. Call signaling knows to go
      north if you are calling a crab shop in Alaska.
      >>>>>>
      >>>>>> Now such direction does not indicate if the shop is open
      or has crabs.
      >>>>>>
      >>>>>> That info you need to get over the top as a service. So I
      am much more in favor to make the service to tell you directly or
      indirectly that it is available.
      >>>>>>
      >>>>>> [JD]  Right.  Just because a node is up and connected to
      the network does not imply that a given application is active on
      it.
      >>>>>>
      >>>>>> Best,
      >>>>>> R.
      >>>>>>
      >>>>>>
      >>>>>>
      >>>>>>
      >>>>>>
      >>>>>> On Tue, Jan 11, 2022 at 1:07 AM Les Ginsberg (ginsberg) <
      [email protected]> wrote:
      >>>>>> Robert -
      >>>>>>
      >>>>>> From: Robert Raszuk <[email protected]>
      >>>>>> Sent: Monday, January 10, 2022 2:56 PM
      >>>>>> To: Les Ginsberg (ginsberg) <[email protected]>
      >>>>>> Cc: Tony Li <[email protected]>; Christian Hopps <
      [email protected]>; Peter Psenak (ppsenak) <[email protected]>;
      Shraddha Hegde <[email protected]>; Aijun Wang <
      [email protected]>; Hannes Gredler <[email protected]>;
      lsr <[email protected]>
      >>>>>> Subject: Re: [Lsr] BGP vs PUA/PULSE
      >>>>>>
      >>>>>> Les,
      >>>>>>
      >>>>>> We have received requests from real customers who both
      need to summarize AND would like better response time to loss of
      reachability to individual nodes.
      >>>>>>
      >>>>>> We all agree the request is legitimate.
      >>>>>>
      >>>>>> [LES:] It does not seem to me that everyone does agree on
      that – but I appreciate that you agree.
      >>>>>>
      >>>>>> But do they realize that to practically employ what you
      are proposing (new PDU flooding) requires 100% software upgrade
      to all IGP nodes in the entire network ? Do they also realize
      that to effectively use it requires data plane change (sure
      software but data plane code is not as simple as PI) on all
      ingress PEs ?
      >>>>>>
      >>>>>> [LES:] As far as forwarding, as Peter has indicated, we
      have a POC and it works fine. And there are many possible ways
      for implementations to go.
      >>>>>> It may or may not require 100% software upgrade – but I
      agree a significant number of nodes have to be upgraded to at
      least support pulse flooding.
      >>>>>>
      >>>>>>
      >>>>>> And with scale requirements you are describing it seems
      this would be 1000s of nodes (if not more). That's massive if
      compared to alternative approaches to achieve the same or even
      better results.
      >>>>>>
      >>>>>> [LES:] Be happy to review other solutions if/when someone
      writes them up.
      >>>>>> I think what is overlooked in the discussion of other
      solutions is that reachability info is provided by the IGP. If
      all the IGP advertises is a summary then how would individual
      loss of reachability become known at scale?
      >>>>>> You seem focused on the notification delivery mechanism
      only.
      >>>>>>
      >>>>>> Les
      >>>>>>
      >>>>>> Many thx,
      >>>>>> Robert
      >>>>>>
      >>>>>> _______________________________________________
      >>>>>> Lsr mailing list
      >>>>>> [email protected]
      >>>>>> https://www.ietf.org/mailman/listinfo/lsr
      >>>>>
      >>>>
      >>>
      >>
      >

      _______________________________________________
      Lsr mailing list
      [email protected]
      https://www.ietf.org/mailman/listinfo/lsr


_______________________________________________
Lsr mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/lsr


_______________________________________________
Lsr mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/lsr

Re: [Lsr] BGP vs PUA/PULSE

Reply via email to