Re: [Lsr] BGP vs PUA/PULSE

Christian Hopps Sun, 23 Jan 2022 22:06:11 -0800


Ok, I guess I'll repeat what I said, as I don't believe anything new was 
presented here.


   Yes, having worked intimately with these IGPs for > 20 years now,
   I understand the use and the implications of using summary
   routes. :)

   My opinion remains unchanged.

"If a prefix is important enough to consider seriously hacking the routing protocol 
to signal the prefix being unreachable, then that prefix is important enough to not 
summarize to begin with." IOW; KISS

I'd prefer to not keep repeating this when presented with the same arguments, 
so please take any silence on my part as my opinion being unchanged.

Thanks,
Chris.
[As WG member]



Gyan Mishra <[email protected]> writes:

Hi Chris 


Just about every vendor out there recommended best practice is to
layout address plan to take advantage of summarization wherever
possible and that as well includes PE loopback next hop attribute to
limit the router load as well as size of LSDB in the backbone as well
as domain wide.  

I think you would be hard pressed to find any vendor that would say
go ahead and flood loopbacks domain wide and don’t summarize.

In large domains flooding domain wide is not feasible and
summarization is requirement even for the critical loopback BGP next
hops for most operators.

RFC 5302 talks about the ramifications of flooding in ISIS domain in
section 1.2 excerpt below:


1.2.  Scalability

   The disadvantage to performing the domain-wide prefix distribution
   described above is that it has an impact on the scalability of IS-IS.
   Areas within IS-IS help scalability in that LSPs are contained within
   a single area.  This limits the size of the link state database,
   which in turn limits the complexity of the shortest path computation.

   Further, the summarization of the prefix information aids scalability
   in that the abstraction of the prefix information removes the sheer
   number of data items to be transported and the number of routes to be
   computed.

   It should be noted quite strongly that the distribution of prefixes
   on a domain-wide basis impacts the scalability of IS-IS in the second
   respect.  It will increase the number of prefixes throughout the
   domain.  This will result in increased memory consumption,
   transmission requirements, and computation requirements throughout
   the domain.

   It must also be noted that the domain-wide distribution of prefixes
   has no effect whatsoever on the first aspect of scalability, namely
   the existence of areas and the limitation of the distribution of the
   link state database.




Gyan
On Fri, Jan 14, 2022 at 9:07 PM Christian Hopps <[email protected]>
wrote:

    Yes, having worked intimately with these IGPs for > 20 years now,
    I understand the use and the implications of using summary
    routes. :)

    My opinion remains unchanged.

    Thanks,
    Chris.
    [as wg member]

    > On Jan 14, 2022, at 8:50 PM, Aijun Wang <
    [email protected]> wrote:
    >
    > Hi, Christian:
    >
    > We should consider the balance and efficiency for the summary
    or not summary.
    > If you don’t summary, then all the areas will be filled with
    the specified detail routes(all PE’s loopback, may also include
    all P’s loopback). This can certainly increase the burden of the
    routers.
    >
    > But with summary, all these specific routes need not exist in
    the routing table. The nodes within the IGP need only be notified
    when one node is failure to accelerate the switchover of the
    overlay service.
    > And, you can also select to not using such mechanism, then the
    service will be backhole for some time until the service/
    application find this abnormal phenomenon.
    > PUA/PULSE are just the mechanism to reduce the abnormal
    durations, it is one kind of FRR technique.
    >
    > Aijun Wang
    > China Telecom
    >
    >> On Jan 15, 2022, at 09:26, Christian Hopps <[email protected]>
    wrote:
    >>
    >>
    >>
    >>> On Jan 14, 2022, at 8:25 PM, Christian Hopps <
    [email protected]> wrote:
    >>>
    >>> I understand the proposal. As I've stated elsewhere, I do not
    believe there is a problem here that needs solving. The "problem"
    was created by the user by summarizing prefixes that should not
    have been summarized -- they mis-configured their network. The
    routing protocols works just fine (act very quickly) if you don't
    incorrectly summarize "really important prefixes".
    >>>
    >>> I was simply pointing out that IGPs also don't deal in
    liveness since that keeps coming up.
    >>
    >> Sorry that was "as wg member".
    >>
    >>>
    >>> Thanks,
    >>> Chris.
    >>>
    >>>>> On Jan 14, 2022, at 8:06 PM, Aijun Wang <
    [email protected]> wrote:
    >>>>
    >>>> Hi, Christian and John:
    >>>>
    >>>> No. I think you all may misunderstand the proposal. What we
    are detecting is actually the reachability/liveness of node that
    connected to the application, not the application itself.
    >>>> And, I think the node liveness is same as the node
    reachability. They will all influence or break the path to their
    connected service if their forwarding function is failed.
    >>>>
    >>>> Aijun Wang
    >>>> China Telecom
    >>>>
    >>>>> On Jan 15, 2022, at 08:56, Christian Hopps <
    [email protected]> wrote:
    >>>>>
    >>>>> Indeed, and in fact the IGP should only be dealing with the
    reachability to the node, not with the node or applications
    liveness.
    >>>>>
    >>>>> Thanks,
    >>>>> Chris.
    >>>>> [as wg member]
    >>>>>
    >>>>>> On Jan 14, 2022, at 7:47 PM, John E Drake <
    [email protected]> wrote:
    >>>>>>
    >>>>>> I don’t think so.  Today things just work, at a given time
    scale.  What you said you are trying to do is reduce the time
    scale for detecting that an application on a node has failed. 
    However, conflating the health of a node with the health of an
    application on that node seems to be inherently flawed.   
    >>>>>>
    >>>>>> Yours Irrespectively,
    >>>>>>
    >>>>>> John
    >>>>>>
    >>>>>>
    >>>>>> Juniper Business Use Only
    >>>>>> From: Aijun Wang <[email protected]>
    >>>>>> Sent: Friday, January 14, 2022 7:40 PM
    >>>>>> To: John E Drake <[email protected]>
    >>>>>> Cc: Les Ginsberg (ginsberg) <[email protected]>; Robert
    Raszuk <[email protected]>; Christian Hopps <[email protected]>;
    Shraddha Hegde <[email protected]>; Tony Li <[email protected]>;
    Hannes Gredler <[email protected]>; lsr <[email protected]>; Peter
    Psenak (ppsenak) <[email protected]>
    >>>>>> Subject: Re: [Lsr] BGP vs PUA/PULSE
    >>>>>>
    >>>>>> [External Email. Be cautious of content]
    >>>>>>
    >>>>>> When the node is up, all the following process are passed
    to the application layer. This is the normal procedures of the
    IGP should do.
    >>>>>> According to your logic, IGP are solving the wrong problem
    now?
    >>>>>>
    >>>>>> Aijun Wang
    >>>>>> China Telecom
    >>>>>>
    >>>>>>
    >>>>>> On Jan 15, 2022, at 08:30, John E Drake <jdrake=
    [email protected]> wrote:
    >>>>>>
    >>>>>>
    >>>>>> Correct, but as Tony, Robert and I have noted, a node
    being up does not mean that an application on that node is up,
    which means that your proposed solution is probably a solution to
    the wrong problem.  Further, Robert’s solution is probably a
    solution to the right problem.
    >>>>>>
    >>>>>> Yours Irrespectively,
    >>>>>>
    >>>>>> John
    >>>>>>
    >>>>>>
    >>>>>> Juniper Business Use Only
    >>>>>> From: Aijun Wang <[email protected]>
    >>>>>> Sent: Friday, January 14, 2022 5:53 PM
    >>>>>> To: John E Drake <[email protected]>
    >>>>>> Cc: Robert Raszuk <[email protected]>; Les Ginsberg
    (ginsberg) <[email protected]>; Christian Hopps <
    [email protected]>; Shraddha Hegde <[email protected]>; Tony
    Li <[email protected]>; Hannes Gredler <[email protected]>; lsr <
    [email protected]>; Peter Psenak (ppsenak) <[email protected]>
    >>>>>> Subject: Re: [Lsr] BGP vs PUA/PULSE
    >>>>>>
    >>>>>> [External Email. Be cautious of content]
    >>>>>>
    >>>>>> Hi, John:
    >>>>>> Please note if the node is down, the service will not be
    accessed.
    >>>>>> We are discussing the “DOWN” notification, not the “UP”
    notification.
    >>>>>>
    >>>>>> Aijun Wang
    >>>>>> China Telecom
    >>>>>>
    >>>>>>
    >>>>>> On Jan 15, 2022, at 00:25, John E Drake <jdrake=
    [email protected]> wrote:
    >>>>>>
    >>>>>>
    >>>>>> Hi,
    >>>>>>
    >>>>>> Comment inline below.
    >>>>>>
    >>>>>> Yours Irrespectively,
    >>>>>>
    >>>>>> John
    >>>>>>
    >>>>>>
    >>>>>> Juniper Business Use Only
    >>>>>> From: Lsr <[email protected]> On Behalf Of Robert
    Raszuk
    >>>>>> Sent: Monday, January 10, 2022 7:15 PM
    >>>>>> To: Les Ginsberg (ginsberg) <[email protected]>
    >>>>>> Cc: Christian Hopps <[email protected]>; Aijun Wang <
    [email protected]>; Shraddha Hegde <[email protected]
    >; Tony Li <[email protected]>; Hannes Gredler <[email protected]>;
    lsr <[email protected]>; Peter Psenak (ppsenak) <[email protected]>
    >>>>>> Subject: Re: [Lsr] BGP vs PUA/PULSE
    >>>>>>
    >>>>>> [External Email. Be cautious of content]
    >>>>>>
    >>>>>> Hi Les,
    >>>>>>
    >>>>>>> You seem focused on the notification delivery mechanism
    only.
    >>>>>>
    >>>>>> Not really. For me, an advertised summary is like a prefix
    when you are dialing a country code. Call signaling knows to go
    north if you are calling a crab shop in Alaska.
    >>>>>>
    >>>>>> Now such direction does not indicate if the shop is open
    or has crabs.
    >>>>>>
    >>>>>> That info you need to get over the top as a service. So I
    am much more in favor to make the service to tell you directly or
    indirectly that it is available.
    >>>>>>
    >>>>>> [JD]  Right.  Just because a node is up and connected to
    the network does not imply that a given application is active on
    it.
    >>>>>>
    >>>>>> Best,
    >>>>>> R.
    >>>>>>
    >>>>>>
    >>>>>>
    >>>>>>
    >>>>>>
    >>>>>> On Tue, Jan 11, 2022 at 1:07 AM Les Ginsberg (ginsberg) <
    [email protected]> wrote:
    >>>>>> Robert -
    >>>>>>
    >>>>>> From: Robert Raszuk <[email protected]>
    >>>>>> Sent: Monday, January 10, 2022 2:56 PM
    >>>>>> To: Les Ginsberg (ginsberg) <[email protected]>
    >>>>>> Cc: Tony Li <[email protected]>; Christian Hopps <
    [email protected]>; Peter Psenak (ppsenak) <[email protected]>;
    Shraddha Hegde <[email protected]>; Aijun Wang <
    [email protected]>; Hannes Gredler <[email protected]>;
    lsr <[email protected]>
    >>>>>> Subject: Re: [Lsr] BGP vs PUA/PULSE
    >>>>>>
    >>>>>> Les,
    >>>>>>
    >>>>>> We have received requests from real customers who both
    need to summarize AND would like better response time to loss of
    reachability to individual nodes.
    >>>>>>
    >>>>>> We all agree the request is legitimate.
    >>>>>>
    >>>>>> [LES:] It does not seem to me that everyone does agree on
    that – but I appreciate that you agree.
    >>>>>>
    >>>>>> But do they realize that to practically employ what you
    are proposing (new PDU flooding) requires 100% software upgrade
    to all IGP nodes in the entire network ? Do they also realize
    that to effectively use it requires data plane change (sure
    software but data plane code is not as simple as PI) on all
    ingress PEs ?
    >>>>>>
    >>>>>> [LES:] As far as forwarding, as Peter has indicated, we
    have a POC and it works fine. And there are many possible ways
    for implementations to go.
    >>>>>> It may or may not require 100% software upgrade – but I
    agree a significant number of nodes have to be upgraded to at
    least support pulse flooding.
    >>>>>>
    >>>>>>
    >>>>>> And with scale requirements you are describing it seems
    this would be 1000s of nodes (if not more). That's massive if
    compared to alternative approaches to achieve the same or even
    better results.
    >>>>>>
    >>>>>> [LES:] Be happy to review other solutions if/when someone
    writes them up.
    >>>>>> I think what is overlooked in the discussion of other
    solutions is that reachability info is provided by the IGP. If
    all the IGP advertises is a summary then how would individual
    loss of reachability become known at scale?
    >>>>>> You seem focused on the notification delivery mechanism
    only.
    >>>>>>
    >>>>>> Les
    >>>>>>
    >>>>>> Many thx,
    >>>>>> Robert
    >>>>>>
    >>>>>> _______________________________________________
    >>>>>> Lsr mailing list
    >>>>>> [email protected]
    >>>>>> https://www.ietf.org/mailman/listinfo/lsr
    >>>>>
    >>>>
    >>>
    >>
    >

    _______________________________________________
    Lsr mailing list
    [email protected]
    https://www.ietf.org/mailman/listinfo/lsr


_______________________________________________
Lsr mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/lsr

Re: [Lsr] BGP vs PUA/PULSE

Reply via email to