Hi Robert, Unless we have perfect load balancing, congestion is always possible, even in a non-blocking Clos fabric. Also, there are other scenarios where avoiding fate-sharing paths is crucial.
Thanks, Kevin From: Robert Raszuk <[email protected]> Date: Wednesday, December 3, 2025 at 3:59 PM To: Wang, Kevin <[email protected]> Cc: Gyan Mishra <[email protected]>, idr@ietf. org <[email protected]>, lsr <[email protected]> Subject: Re: [Idr] Re: Fwd: I-D Action: draft-wang-idr-dpf-00.txt Hi Kevin, Your draft explains how to do poor man's flex algo in BGP - ok. But could you elaborate why anyone would do that (and push more complexity) in a non-blocking CLOS fabric ? Cheers, R. On Wed, Dec 3, 2025 at 7:35 PM Wang, Kevin <[email protected]<mailto:[email protected]>> wrote: Hi Robert, Thank you for providing further details about your thoughts. What I heard that IGP was not initially adopted in DC fabrics was due to its scaling issues (mostly due to lsdb flooding), especially for the hyperscalers. I understand that there were efforts later trying to address the scaling issues from IGP side. I see your experience of using ISIS to successfully construct the fabric as a good example. Yes, it might be worth to write an ISIS for DC fabrics informational RFC, serving as an alternative to RFC 7938. There are also other efforts trying to bring traffic engineering technologies, such as RSVP, MPTE, etc to the DC fabrics. Like any other networks, the DC fabrics will probably also evolve over time. Having said that, most of today’s DC fabrics (at least for those DC customers I have dealt with) are designed following RFC 7938: * Use Clos topology * Use IP forwarding * Use EBGP as the underlay routing protocol I guess the choices above are for technical reasons as well as business reasons. BGP DPF is developed under the assumptions/observations above. I agree that the DC fabrics might evolve and adopt other technologies such as IGP, RSVP, in the future. For the time being and the foreseeable future, BGP DPF would help to provide a lightweight traffic engineering for the DC fabrics. Thanks, Kevin From: Robert Raszuk <[email protected]<mailto:[email protected]>> Date: Tuesday, December 2, 2025 at 2:46 PM To: Wang, Kevin <[email protected]<mailto:[email protected]>> Cc: Gyan Mishra <[email protected]<mailto:[email protected]>>, idr@ietf. org <[email protected]<mailto:[email protected]>>, lsr <[email protected]<mailto:[email protected]>> Subject: Re: [Idr] Re: Fwd: I-D Action: draft-wang-idr-dpf-00.txt Dear Kevin, I know very well what RFC 7938 says. In fact I did review this document well before it became an RFC :) But what happened next is that while RFC7938 make a valid observation on how one can build MSDCs lots of folks misinterpreted it as the only guide on how to build even a few racks of DC fabrics. So yes, using BGP to construct dynamic routing in the DC fabrics has its use cases that are really applicable to only a handful of deployments. And I am not aware that any of the MSDCs would be asking you for logical transport planes within their fabrics. All other DCs would be much better off using IGP for underlay and BGP for overlay as a design pattern. When I constructed 10 full racks of hardware using ISIS folks were shocked - and pointed out that I am not using an IETF standard approach :). Then when I demonstrated that connectivity restoration upon any node or link failure is repaired in less then 50 ms the masks went off. Maybe what is actually needed is an informational RFC - just like RFC7938 - simply illustrating that one can construct DC using ISIS. It is obvious to me, but I admit there is no RFC I am aware of to show operators that "Large-Scale Data Centers" can be robustly build with IGPs. Kind regards, Robert On Tue, Dec 2, 2025 at 7:24 PM Wang, Kevin <[email protected]<mailto:[email protected]>> wrote: Hi Robert and Gyan, Thanks for your feedback! Your observation is correct that IGP Flex Algo could achieve the same. BGP DPF can be though as a BGP counterpart of IGP Flex Algo to some extent (though not precisely). As explained in the “Introduction” section of this draft, BGP DPF is designed for the current IP fabric environment where EBGP is usually the only protocol used for routing. Section 5 of RFC 7938 explains why DC fabrics use EBGP as the sole routing protocol. Thanks, Kevin From: Gyan Mishra <[email protected]<mailto:[email protected]>> Date: Tuesday, December 2, 2025 at 7:43 AM To: Robert Raszuk <[email protected]<mailto:[email protected]>> Cc: idr@ietf. org <[email protected]<mailto:[email protected]>>, lsr <[email protected]<mailto:[email protected]>> Subject: [Idr] Re: Fwd: I-D Action: draft-wang-idr-dpf-00.txt I agree with Robert that you could use RFC 9502 IGP Flex Algo in IP networks to build disjoint planes as desired. You could also use SRv6 with IGP Flex Algo with SR RFC 9350 which uses IPv6 data plane and build your disjoint planes. Thanks Gyan On Tue, Dec 2, 2025 at 6:32 AM Robert Raszuk <[email protected]<mailto:[email protected]>> wrote: Hi, In respect to the subject draft ... why would you not use IGP Flexible Algorithm for it ? Are you going to port now years of work from IGP to BGP to achieve the same ? Besides, in a non-blocking fabric latency is really not a factor. So you want to logically partition it to make it blocking them worry about what travels on which such logical plane ? Is this a reasonable direction ? Thx, R. ---------- Forwarded message --------- From: <[email protected]<mailto:[email protected]>> Date: Mon, Dec 1, 2025 at 10:49 PM Subject: I-D Action: draft-wang-idr-dpf-00.txt To: <[email protected]<mailto:[email protected]>> Internet-Draft draft-wang-idr-dpf-00.txt is now available. Title: BGP Deterministic Path Forwarding (DPF) Authors: Kevin Wang Michal Styszynski Wen Lin Mahesh Subramaniam Thomas Kampa Diptanshu Singh Name: draft-wang-idr-dpf-00.txt Pages: 18 Dates: 2025-12-01 Abstract: Modern data center (DC) fabrics typically employ Clos topologies with External BGP (EBGP) for plain IPv4/IPv6 routing. While hop-by-hop EBGP routing is simple and scalable, it provides only a single best- effort forwarding service for all types of traffic. This single best-effort service might be insufficient for increasingly diverse traffic requirements in modern DC environments. For example, loss and latency sensitive AI/ML flows may demand stronger Service Level Agreements (SLA) than general purpose traffic. Duplication schemes which are standardized through protocols such as Parallel Redundancy Protocol (PRP) require disjoint forwarding paths to avoid single points of failure. Congestion avoidance may require more deterministic forwarding behavior. This document introduces BGP Deterministic Path Forwarding (DPF), a mechanism that partitions the physical fabric into multiple logical fabrics. Flows can be mapped to different logical fabrics based on their specific requirements, enabling deterministic forwarding behavior within the data center. The IETF datatracker status page for this Internet-Draft is: https://datatracker.ietf.org/doc/draft-wang-idr-dpf/<https://urldefense.com/v3/__https://datatracker.ietf.org/doc/draft-wang-idr-dpf/__;!!NEt6yMaO-gk!EP_lEYmqbOUApQqqOz-ZuP9CsojS2gbvLvgQfxoYTXPXtS-0yjfv8ElqZwJBCRfOLFY6nymWoR5eJlshPeG9$> There is also an HTML version available at: https://www.ietf.org/archive/id/draft-wang-idr-dpf-00.html<https://urldefense.com/v3/__https://www.ietf.org/archive/id/draft-wang-idr-dpf-00.html__;!!NEt6yMaO-gk!EP_lEYmqbOUApQqqOz-ZuP9CsojS2gbvLvgQfxoYTXPXtS-0yjfv8ElqZwJBCRfOLFY6nymWoR5eJjgsy_TY$> Internet-Drafts are also available by rsync at: rsync.ietf.org::internet-drafts _______________________________________________ I-D-Announce mailing list -- [email protected]<mailto:[email protected]> To unsubscribe send an email to [email protected]<mailto:[email protected]> _______________________________________________ Idr mailing list -- [email protected]<mailto:[email protected]> To unsubscribe send an email to [email protected]<mailto:[email protected]>
_______________________________________________ Lsr mailing list -- [email protected] To unsubscribe send an email to [email protected]
