Peter Psenak <ppse...@cisco.com> writes:
On 24/01/2022 21:27, Christian Hopps wrote:Peter Psenak <ppsenak=40cisco....@dmarc.ietf.org> writes:On 24/01/2022 16:19, Christian Hopps wrote:Peter Psenak <ppsenak=40cisco....@dmarc.ietf.org> writes:Chris, On 24/01/2022 10:29, Christian Hopps wrote:Again KISS applies here: If the summarization process *doesn't work* for a given prefix P, then *don't use summarization* for prefix P!above simply does not work. 1. so far nobody summarizes and it all works. True, reason being that the number of PEs in the network is typically below 10k. Also summarization with the MPLS data plane is problematic. We are getting requests to design a next-gen networks that will include 100k PEs. The summarization is essential in such scale.So pick a better design -- seriously. It seems its time to think "outside the single IGP box" to handle the PE to PE functionality when you start contemplating 100k PEs. Especially when every PE probably doesn't need full-mesh hyper-adaptive knowledge of every other PE for doing their jobs.- multiple IGPs do not solve the problem. You can stick BGP between them, but it brings its own issues. What else do you have?These aren't my customers bringing these demands to me -- so not my job. :) Maybe this doesn't even need to be done in routing at all.I thought scaling IGPs to meet the real life deployment requirements is part of the LSR WG charter. If we say "go and fix it elsewhere", maybe we can close the WG.
This is disingenuous. I think it's pretty clear that I'm not saying that "let's not fix scaling issues". I'm saying don't hack the hell out of the protocol to make it do something it's not designed to do. Thanks, Chris.
- the point is we can scale IGPs to these numners with summarization easily. With summarization you have a perfect topology isolation and decent convergence if we solve the problem in hand.But that the point, it's *not* easy at all, you are having to hack the routing protocol into doing non-normal ugly things to make it into a workable solution.why? Flat ISIS networks with 3k PE nodes have been deployed. In such case, each router sees full topology of 3k+ nodes + 3k PE addresses. With the 100k PE, split in 100 areas with the summarization in place, each router would see 1k nodes in its area + 1,1k prefixes. Less than the above. What is the problem?- nobody claims every PE needs to talk to every PE. But any PE in any area may need to talk to some PEs from other areas.Definitely something to consider for looking at better fitting solutions. Putting this in the IGP provides way more than you need, which may be a hint at why that solution is not cleaning working for you.sorry, I don't get what you are trying to say here. You are claiming something does not work without providing any evidence as why that would be the case. thanks, PeterThanks, Chris.In any case the inelegance of the proposed changes to the routing protocol are a giant flashing red warning light that the chosen design is not the right one.I don't understand the basis of the above statement. thanks, PeterThanks, Chris. [as wg member]2. We are only talking about PE addresses here, not the infrastructure links obviously - those are filtered out using other techniques. All the PE addresses are equally important, it's not possible to make only some of them important, while others not. thanks, PeterThanks, Chris. [As wg member]Best Regards Aijun Wang China Telecom -----Original Message----- From: Christian Hopps <cho...@chopps.org> Sent: Monday, January 24, 2022 1:50 PM To: Gyan Mishra <hayabusa...@gmail.com> Cc: Christian Hopps <cho...@chopps.org>; Aijun Wang <wangai...@tsinghua.org.cn>; Hannes Gredler <han...@gredler.at>; John E Drake <jdr...@juniper.net>; Les Ginsberg (ginsberg) <ginsb...@cisco.com>; Peter Psenak (ppsenak) <ppse...@cisco.com>; Robert Raszuk <rob...@raszuk.net>; Shraddha Hegde <shrad...@juniper.net>; Tony Li <tony...@tony.li>; lsr <lsr@ietf.org> Subject: Re: [Lsr] BGP vs PUA/PULSE Ok, I guess I'll repeat what I said, as I don't believe anything new was presented here. Yes, having worked intimately with these IGPs for > 20 years now, I understand the use and the implications of using summary routes. :) My opinion remains unchanged. "If a prefix is important enough to consider seriously hacking the routing protocol to signal the prefix being unreachable, then that prefix is important enough to not summarize to begin with." IOW; KISS I'd prefer to not keep repeating this when presented with the same arguments, so please take any silence on my part as my opinion being unchanged. Thanks, Chris. [As WG member] Gyan Mishra <hayabusa...@gmail.com> writes:Hi Chris Just about every vendor out there recommended best practice is to layout address plan to take advantage of summarization wherever possible and that as well includes PE loopback next hop attribute to limit the router load as well as size of LSDB in the backbone as well as domain wide. I think you would be hard pressed to find any vendor that would say go ahead and flood loopbacks domain wide and don’t summarize. In large domains flooding domain wide is not feasible and summarization is requirement even for the critical loopback BGP next hops for most operators. RFC 5302 talks about the ramifications of flooding in ISIS domain in section 1.2 excerpt below: 1.2. Scalability The disadvantage to performing the domain-wide prefix distribution described above is that it has an impact on the scalability of IS-IS. Areas within IS-IS help scalability in that LSPs are contained within a single area. This limits the size of the link state database, which in turn limits the complexity of the shortest path computation. Further, the summarization of the prefix information aids scalability in that the abstraction of the prefix information removes the sheer number of data items to be transported and the number of routes to be computed. It should be noted quite strongly that the distribution of prefixes on a domain-wide basis impacts the scalability of IS-IS in the second respect. It will increase the number of prefixes throughout the domain. This will result in increased memory consumption, transmission requirements, and computation requirements throughout the domain. It must also be noted that the domain-wide distribution of prefixes has no effect whatsoever on the first aspect of scalability, namely the existence of areas and the limitation of the distribution of the link state database. Gyan On Fri, Jan 14, 2022 at 9:07 PM Christian Hopps <cho...@chopps.org> wrote: Yes, having worked intimately with these IGPs for > 20 years now, I understand the use and the implications of using summary routes. :) My opinion remains unchanged. Thanks, Chris. [as wg member] > On Jan 14, 2022, at 8:50 PM, Aijun Wang < wangai...@tsinghua.org.cn> wrote: > > Hi, Christian: > > We should consider the balance and efficiency for the summary or not summary. > If you don’t summary, then all the areas will be filled with the specified detail routes(all PE’s loopback, may also include all P’s loopback). This can certainly increase the burden of the routers. > > But with summary, all these specific routes need not exist in the routing table. The nodes within the IGP need only be notified when one node is failure to accelerate the switchover of the overlay service. > And, you can also select to not using such mechanism, then the service will be backhole for some time until the service/ application find this abnormal phenomenon. > PUA/PULSE are just the mechanism to reduce the abnormal durations, it is one kind of FRR technique. > > Aijun Wang > China Telecom > >> On Jan 15, 2022, at 09:26, Christian Hopps <cho...@chopps.org> wrote: >> >> >> >>> On Jan 14, 2022, at 8:25 PM, Christian Hopps < cho...@chopps.org> wrote: >>> >>> I understand the proposal. As I've stated elsewhere, I do not believe there is a problem here that needs solving. The "problem" was created by the user by summarizing prefixes that should not have been summarized -- they mis-configured their network. The routing protocols works just fine (act very quickly) if you don't incorrectly summarize "really important prefixes". >>> >>> I was simply pointing out that IGPs also don't deal in liveness since that keeps coming up. >> >> Sorry that was "as wg member". >> >>> >>> Thanks, >>> Chris. >>> >>>>> On Jan 14, 2022, at 8:06 PM, Aijun Wang < wangai...@tsinghua.org.cn> wrote: >>>> >>>> Hi, Christian and John: >>>> >>>> No. I think you all may misunderstand the proposal. What we are detecting is actually the reachability/liveness of node that connected to the application, not the application itself. >>>> And, I think the node liveness is same as the node reachability. They will all influence or break the path to their connected service if their forwarding function is failed. >>>> >>>> Aijun Wang >>>> China Telecom >>>> >>>>> On Jan 15, 2022, at 08:56, Christian Hopps < cho...@chopps.org> wrote: >>>>> >>>>> Indeed, and in fact the IGP should only be dealing with the reachability to the node, not with the node or applications liveness. >>>>> >>>>> Thanks, >>>>> Chris. >>>>> [as wg member] >>>>> >>>>>> On Jan 14, 2022, at 7:47 PM, John E Drake < jdr...@juniper.net> wrote: >>>>>> >>>>>> I don’t think so. Today things just work, at a given time scale. What you said you are trying to do is reduce the time scale for detecting that an application on a node has failed. However, conflating the health of a node with the health of an application on that node seems to be inherently flawed. >>>>>> >>>>>> Yours Irrespectively, >>>>>> >>>>>> John >>>>>> >>>>>> >>>>>> Juniper Business Use Only >>>>>> From: Aijun Wang <wangai...@tsinghua.org.cn> >>>>>> Sent: Friday, January 14, 2022 7:40 PM >>>>>> To: John E Drake <jdr...@juniper.net> >>>>>> Cc: Les Ginsberg (ginsberg) <ginsb...@cisco.com>; Robert Raszuk <rob...@raszuk.net>; Christian Hopps <cho...@chopps.org>; Shraddha Hegde <shrad...@juniper.net>; Tony Li <tony...@tony.li>; Hannes Gredler <han...@gredler.at>; lsr <lsr@ietf.org>; Peter Psenak (ppsenak) <ppse...@cisco.com> >>>>>> Subject: Re: [Lsr] BGP vs PUA/PULSE >>>>>> >>>>>> [External Email. Be cautious of content] >>>>>> >>>>>> When the node is up, all the following process are passed to the application layer. This is the normal procedures of the IGP should do. >>>>>> According to your logic, IGP are solving the wrong problem now? >>>>>> >>>>>> Aijun Wang >>>>>> China Telecom >>>>>> >>>>>> >>>>>> On Jan 15, 2022, at 08:30, John E Drake <jdrake= 40juniper....@dmarc.ietf.org> wrote: >>>>>> >>>>>> >>>>>> Correct, but as Tony, Robert and I have noted, a node being up does not mean that an application on that node is up, which means that your proposed solution is probably a solution to the wrong problem. Further, Robert’s solution is probably a solution to the right problem. >>>>>> >>>>>> Yours Irrespectively, >>>>>> >>>>>> John >>>>>> >>>>>> >>>>>> Juniper Business Use Only >>>>>> From: Aijun Wang <wangai...@tsinghua.org.cn> >>>>>> Sent: Friday, January 14, 2022 5:53 PM >>>>>> To: John E Drake <jdr...@juniper.net> >>>>>> Cc: Robert Raszuk <rob...@raszuk.net>; Les Ginsberg (ginsberg) <ginsb...@cisco.com>; Christian Hopps < cho...@chopps.org>; Shraddha Hegde <shrad...@juniper.net>; Tony Li <tony...@tony.li>; Hannes Gredler <han...@gredler.at>; lsr < lsr@ietf.org>; Peter Psenak (ppsenak) <ppse...@cisco.com> >>>>>> Subject: Re: [Lsr] BGP vs PUA/PULSE >>>>>> >>>>>> [External Email. Be cautious of content] >>>>>> >>>>>> Hi, John: >>>>>> Please note if the node is down, the service will not be accessed. >>>>>> We are discussing the “DOWN” notification, not the “UP” notification. >>>>>> >>>>>> Aijun Wang >>>>>> China Telecom >>>>>> >>>>>> >>>>>> On Jan 15, 2022, at 00:25, John E Drake <jdrake= 40juniper....@dmarc.ietf.org> wrote: >>>>>> >>>>>> >>>>>> Hi, >>>>>> >>>>>> Comment inline below. >>>>>> >>>>>> Yours Irrespectively, >>>>>> >>>>>> John >>>>>> >>>>>> >>>>>> Juniper Business Use Only >>>>>> From: Lsr <lsr-boun...@ietf.org> On Behalf Of Robert Raszuk >>>>>> Sent: Monday, January 10, 2022 7:15 PM >>>>>> To: Les Ginsberg (ginsberg) <ginsb...@cisco.com> >>>>>> Cc: Christian Hopps <cho...@chopps.org>; Aijun Wang < wangai...@tsinghua.org.cn>; Shraddha Hegde <shrad...@juniper.net >; Tony Li <tony...@tony.li>; Hannes Gredler <han...@gredler.at>; lsr <lsr@ietf.org>; Peter Psenak (ppsenak) <ppse...@cisco.com> >>>>>> Subject: Re: [Lsr] BGP vs PUA/PULSE >>>>>> >>>>>> [External Email. Be cautious of content] >>>>>> >>>>>> Hi Les, >>>>>> >>>>>>> You seem focused on the notification delivery mechanism only. >>>>>> >>>>>> Not really. For me, an advertised summary is like a prefix when you are dialing a country code. Call signaling knows to go north if you are calling a crab shop in Alaska. >>>>>> >>>>>> Now such direction does not indicate if the shop is open or has crabs. >>>>>> >>>>>> That info you need to get over the top as a service. So I am much more in favor to make the service to tell you directly or indirectly that it is available. >>>>>> >>>>>> [JD] Right. Just because a node is up and connected to the network does not imply that a given application is active on it. >>>>>> >>>>>> Best, >>>>>> R. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Jan 11, 2022 at 1:07 AM Les Ginsberg (ginsberg) < ginsb...@cisco.com> wrote: >>>>>> Robert - >>>>>> >>>>>> From: Robert Raszuk <rob...@raszuk.net> >>>>>> Sent: Monday, January 10, 2022 2:56 PM >>>>>> To: Les Ginsberg (ginsberg) <ginsb...@cisco.com> >>>>>> Cc: Tony Li <tony...@tony.li>; Christian Hopps < cho...@chopps.org>; Peter Psenak (ppsenak) <ppse...@cisco.com>; Shraddha Hegde <shrad...@juniper.net>; Aijun Wang < wangai...@tsinghua.org.cn>; Hannes Gredler <han...@gredler.at>; lsr <lsr@ietf.org> >>>>>> Subject: Re: [Lsr] BGP vs PUA/PULSE >>>>>> >>>>>> Les, >>>>>> >>>>>> We have received requests from real customers who both need to summarize AND would like better response time to loss of reachability to individual nodes. >>>>>> >>>>>> We all agree the request is legitimate. >>>>>> >>>>>> [LES:] It does not seem to me that everyone does agree on that – but I appreciate that you agree. >>>>>> >>>>>> But do they realize that to practically employ what you are proposing (new PDU flooding) requires 100% software upgrade to all IGP nodes in the entire network ? Do they also realize that to effectively use it requires data plane change (sure software but data plane code is not as simple as PI) on all ingress PEs ? >>>>>> >>>>>> [LES:] As far as forwarding, as Peter has indicated, we have a POC and it works fine. And there are many possible ways for implementations to go. >>>>>> It may or may not require 100% software upgrade – but I agree a significant number of nodes have to be upgraded to at least support pulse flooding. >>>>>> >>>>>> >>>>>> And with scale requirements you are describing it seems this would be 1000s of nodes (if not more). That's massive if compared to alternative approaches to achieve the same or even better results. >>>>>> >>>>>> [LES:] Be happy to review other solutions if/when someone writes them up. >>>>>> I think what is overlooked in the discussion of other solutions is that reachability info is provided by the IGP. If all the IGP advertises is a summary then how would individual loss of reachability become known at scale? >>>>>> You seem focused on the notification delivery mechanism only. >>>>>> >>>>>> Les >>>>>> >>>>>> Many thx, >>>>>> Robert >>>>>> >>>>>> _______________________________________________ >>>>>> Lsr mailing list >>>>>> Lsr@ietf.org >>>>>> https://www.ietf.org/mailman/listinfo/lsr >>>>> >>>> >>> >> > _______________________________________________ Lsr mailing list Lsr@ietf.org https://www.ietf.org/mailman/listinfo/lsr_______________________________________________ Lsr mailing list Lsr@ietf.org https://www.ietf.org/mailman/listinfo/lsr_______________________________________________ Lsr mailing list Lsr@ietf.org https://www.ietf.org/mailman/listinfo/lsr
signature.asc
Description: PGP signature
_______________________________________________ Lsr mailing list Lsr@ietf.org https://www.ietf.org/mailman/listinfo/lsr