[Lsr] Re: [Idr] I-D Action: draft-wang-idr-dpf-00.txt

Jeffrey Haas Fri, 05 Dec 2025 07:50:52 -0800

[Speaking as an individual contributor. Note, I'm not an author here.]

Robert,

> On Dec 4, 2025, at 5:17 PM, Robert Raszuk <[email protected]> wrote:
> 
> I am sceptical if the proposed BGP extension is desired in BGP protocol. But 
> this is just my own opinion and as Jeff says I can be in "rough" on it. 

For you to be in the rough, there would have be enough other opinions on the 
proposal to compare yours to.  It's too early for there to be a rough. :-)

> But reading on your proposal I do think that marking this coloring on a per 
> BGP session basis (strict or loose) is a very bad idea. We have departed from 
> any per session marking when MP BGP Extensions have been introduced. So if 
> you want to continue I recommend a much more granular capability of coloring. 
> Ideally on a per NLRI/UPDATE MSG basis. 

We have a few mechanisms that take advantage of per-session characteristics.  A 
deployed one is RFC 9234 for marking session type.  A non-deployed one covers 
some of the filtering classes covered in wide communities.  

Role-based filtering is probably good generic plumbing.  It does overlap with 
some of our prior ORF cases as well.

For at least the proposed use case, the reason why ORF likely isn't a good fit 
is a desire to simply not bring up the session in some cases when there's a 
role mismatch.

> With your current proposal you have created a physical partitioning not 
> logical one. 

I agree with that assessment. For the case of BGP fabrics, and likely similar 
overlapping cases for IGP signaled fabrics, this is likely the desired 
property.  

The general discussion happening several places in the IETF largely covering 
"AI fabrics" is focusing itself roughly around two large properties:  
Congestion is bad and we need to avoid it.  How do we partition resources and 
schedule traffic on those resources to avoid congestion.[1]

It's pretty obvious that topological separation is one way to avoid certain 
classes of congestion in conjunction with selectively directing traffic that 
will pass over those links.  This is one of the old schools for doing such 
things.  Soft or hard partitioning of such things (e.g. RSVP and policers) is 
another such mechanism, but that's not what's proposed here.

> Also can you elaborate in your draft (keeping in mind BGP native 
> recursiveness) why BGP CAR or BGP CT proposals fail to address your 
> objectives ? Are they broken and need fixing or you just prefer to start 
> fresh with yet one more way to achieve the same ?

I'll let Kevin and his coauthors cover their particular justifications.  Here's 
a few observations from prior IDR list discussion that are probably applicable:

The dialog around "routes with color" did indeed touch how some links might be 
provisioned with specific colors.  Applying a color to the BGP session was a 
discussed item.  Such a mechanism was determined for the use cases covered to 
not be a requirement and policy was discussed as being generally adequate to 
permit a link that only wanted to originate or receive routes with specific 
colors to do so.  One might observe that this mechanism would work as a 
easy-mode filter for any of our three "routes with color" mechanisms.

The forwarding plane considerations will drive whatever multi-topology 
mechanisms you're able to deploy.  In the absence of something that can take a 
color association and permit the same destination to be forwarded distinctly 
based on different signaling for that destination in a color, it becomes 
necessary for the operator to take care to provision things disjointly.  
Otherwise, it's necessary to pick your mechanism so that things can cross a 
pinch-point in overlapping topologies.  These things are deployment and 
operational choices, especially since it usually means you're adding to our OSI 
layer cake when it's not "just IP".[2]  (These things are not new 
considerations.)

CPR addressed this by noting that keeping your address ranges distinct for your 
forwarding behaviors addresses the problem.

It could be observed that if you're sticking with boring boxes doing RFC 7938, 
careful choice of your address range and building constrained portions of your 
topology to avoid congestion addresses the use case.  This could be done with 
nothing but policy.  Things that permit easy-mode reduce provisioning 
complexity (even if machine driven) are helpful.

Adding color to the routes themselves could certainly be done, and deployed in 
conjunction with all of the above. However, this requires addressing how 
forwarding will work at the pinch points if normal IP, or what forwarding is 
like if it's not normal IP.  Perhaps SR-*?  Certainly deploying a route with 
color mechanism is a way to have paths with different congestion properties, 
but that still begs the provisioning (and thus policy filtering) conversation.

The question for "why not routes with color" is thus more "is the forwarding 
looking to intermix traffic for the same IP destination with different 
forwarding characteristics" along with "if yes, what forwarding paradigm do we 
want in fabric topologies"?

Routes with color (CT/CAR varieties) could certainly be used.  I don't know 
that it makes things better. 

-- Jeff

[1] The usual reference to RFC 1925 is appropriate.  This round it's not RSVP 
and ATM. Luckily enough of the folk who dealt with similar considerations the 
prior rounds are still active to help steer the conversation out of some of the 
prior dead ends.

[2] There are more than a few older gripes covering BGP fabrics that "if we 
were running MPLS, much of this gets easier".  Insert similar gripes for your 
favorite encapsulation.  "Keep it simple" as "reduce the cost of the platform" 
starts becoming a perverse behavior once you start contorting around the 
"simple" use cases.

> 
> 
> 
> On Thu, Dec 4, 2025 at 10:18 PM Wang, Kevin <[email protected]> wrote:
> Hi Robert,
> 
> Unless we have perfect load balancing, congestion is always possible, even in 
> a non-blocking Clos fabric. Also, there are other scenarios where avoiding 
> fate-sharing paths is crucial.
> 
> Thanks,
> Kevin
> 
> From: Robert Raszuk <[email protected]>
> Date: Wednesday, December 3, 2025 at 3:59 PM
> To: Wang, Kevin <[email protected]>
> Cc: Gyan Mishra <[email protected]>, idr@ietf. org <[email protected]>, lsr 
> <[email protected]>
> Subject: Re: [Idr] Re: Fwd: I-D Action: draft-wang-idr-dpf-00.txt
> 
> Hi Kevin,
> 
> Your draft explains how to do poor man's flex algo in BGP - ok. 
> 
> But could you elaborate why anyone would do that (and push more complexity) 
> in a non-blocking CLOS fabric ? 
> 
> Cheers,
> R.
> 
> 
> 
> On Wed, Dec 3, 2025 at 7:35 PM Wang, Kevin <[email protected]> wrote:
> Hi Robert,
> 
> Thank you for providing further details about your thoughts. What I heard 
> that IGP was not initially adopted in DC fabrics was due to its scaling 
> issues (mostly due to lsdb flooding), especially for the hyperscalers. I 
> understand that there were efforts later trying to address the scaling issues 
> from IGP side. I see your experience of using ISIS to successfully construct 
> the fabric as a good example. Yes, it might be worth to write an ISIS for DC 
> fabrics informational RFC, serving as an alternative to RFC 7938. There are 
> also other efforts trying to bring traffic engineering technologies, such as 
> RSVP, MPTE, etc to the DC fabrics. Like any other networks, the DC fabrics 
> will probably also evolve over time.
> 
> Having said that, most of today’s DC fabrics (at least for those DC customers 
> I have dealt with) are designed following RFC 7938:
>       • Use Clos topology
>       • Use IP forwarding
>       • Use EBGP as the underlay routing protocol
> I guess the choices above are for technical reasons as well as business 
> reasons. BGP DPF is developed under the assumptions/observations above. I 
> agree that the DC fabrics might evolve and adopt other technologies such as 
> IGP, RSVP, in the future. For the time being and the foreseeable future, BGP 
> DPF would help to provide a lightweight traffic engineering for the DC 
> fabrics.
> 
> Thanks,
> Kevin
> 
> From: Robert Raszuk <[email protected]>
> Date: Tuesday, December 2, 2025 at 2:46 PM
> To: Wang, Kevin <[email protected]>
> Cc: Gyan Mishra <[email protected]>, idr@ietf. org <[email protected]>, lsr 
> <[email protected]>
> Subject: Re: [Idr] Re: Fwd: I-D Action: draft-wang-idr-dpf-00.txt
> 
> Dear Kevin,
> 
> I know very well what RFC 7938 says. In fact I did review this document well 
> before it became an RFC :) 
> 
> But what happened next is that while RFC7938 make a valid observation on how 
> one can build MSDCs lots of folks misinterpreted it as the only guide on how 
> to build even a few racks of DC fabrics. 
> 
> So yes, using BGP to construct dynamic routing in the DC fabrics has its use 
> cases that are really applicable to only a handful of deployments. And I am 
> not aware that any of the MSDCs would be asking you for logical transport 
> planes within their fabrics. 
> 
> All other DCs would be much better off using IGP for underlay and BGP for 
> overlay as a design pattern. 
> 
> When I constructed 10 full racks of hardware using ISIS folks were shocked - 
> and pointed out that I am not using an IETF standard approach :). Then when I 
> demonstrated that connectivity restoration upon any node or link failure is 
> repaired in less then 50 ms the masks went off. 
> 
> Maybe what is actually needed is an  informational RFC - just like RFC7938 - 
> simply illustrating that one can construct DC using ISIS. It is obvious to 
> me, but I admit there is no RFC I am aware of to show operators that 
> "Large-Scale Data Centers" can be robustly build with IGPs. 
> 
> Kind regards,
> Robert
> 
> 
> On Tue, Dec 2, 2025 at 7:24 PM Wang, Kevin <[email protected]> wrote:
> Hi Robert and Gyan,
> 
> Thanks for your feedback! Your observation is correct that IGP Flex Algo 
> could achieve the same. BGP DPF can be though as a BGP counterpart of IGP 
> Flex Algo to some extent (though not precisely). 
> 
> As explained in the “Introduction” section of this draft, BGP DPF is designed 
> for the current IP fabric environment where EBGP is usually the only protocol 
> used for routing. Section 5 of RFC 7938 explains why DC fabrics use EBGP as 
> the sole routing protocol. 
> 
> Thanks,
> Kevin
> 
> From: Gyan Mishra <[email protected]>
> Date: Tuesday, December 2, 2025 at 7:43 AM
> To: Robert Raszuk <[email protected]>
> Cc: idr@ietf. org <[email protected]>, lsr <[email protected]>
> Subject: [Idr] Re: Fwd: I-D Action: draft-wang-idr-dpf-00.txt
> 
> I agree with Robert that you could use RFC 9502 IGP Flex Algo in IP networks 
> to build disjoint planes as desired.
> 
> You could also use SRv6 with IGP Flex Algo with SR RFC 9350 which uses IPv6 
> data plane and build your disjoint planes.
> 
> Thanks 
> 
> Gyan
> 
> On Tue, Dec 2, 2025 at 6:32 AM Robert Raszuk <[email protected]> wrote:
> Hi, 
> 
> In respect to the subject draft ... why would you not use IGP Flexible 
> Algorithm for it ? 
> 
> Are you going to port now years of work from IGP to BGP to achieve the same ?
> 
> Besides, in a non-blocking fabric latency is really not a factor. So you want 
> to logically partition it to make it blocking them worry about what travels 
> on which such logical plane ? Is this a reasonable direction ? 
> 
> Thx,
> R.
> 
> ---------- Forwarded message ---------
> From: <[email protected]>
> Date: Mon, Dec 1, 2025 at 10:49 PM
> Subject: I-D Action: draft-wang-idr-dpf-00.txt
> To: <[email protected]>
> 
> 
> Internet-Draft draft-wang-idr-dpf-00.txt is now available.
> 
>    Title:   BGP Deterministic Path Forwarding (DPF)
>    Authors: Kevin Wang
>             Michal Styszynski
>             Wen Lin
>             Mahesh Subramaniam
>             Thomas Kampa
>             Diptanshu Singh
>    Name:    draft-wang-idr-dpf-00.txt
>    Pages:   18
>    Dates:   2025-12-01
> 
> Abstract:
> 
>    Modern data center (DC) fabrics typically employ Clos topologies with
>    External BGP (EBGP) for plain IPv4/IPv6 routing.  While hop-by-hop
>    EBGP routing is simple and scalable, it provides only a single best-
>    effort forwarding service for all types of traffic.  This single
>    best-effort service might be insufficient for increasingly diverse
>    traffic requirements in modern DC environments.  For example, loss
>    and latency sensitive AI/ML flows may demand stronger Service Level
>    Agreements (SLA) than general purpose traffic.  Duplication schemes
>    which are standardized through protocols such as Parallel Redundancy
>    Protocol (PRP) require disjoint forwarding paths to avoid single
>    points of failure.  Congestion avoidance may require more
>    deterministic forwarding behavior.
> 
>    This document introduces BGP Deterministic Path Forwarding (DPF), a
>    mechanism that partitions the physical fabric into multiple logical
>    fabrics.  Flows can be mapped to different logical fabrics based on
>    their specific requirements, enabling deterministic forwarding
>    behavior within the data center.
> 
> The IETF datatracker status page for this Internet-Draft is:
> https://datatracker.ietf.org/doc/draft-wang-idr-dpf/
> 
> There is also an HTML version available at:
> https://www.ietf.org/archive/id/draft-wang-idr-dpf-00.html
> 
> Internet-Drafts are also available by rsync at:
> rsync.ietf.org::internet-drafts
> 
> 
> _______________________________________________
> I-D-Announce mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> _______________________________________________
> Idr mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> _______________________________________________
> Idr mailing list -- [email protected]
> To unsubscribe send an email to [email protected]

_______________________________________________
Lsr mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[Lsr] Re: [Idr] I-D Action: draft-wang-idr-dpf-00.txt

Reply via email to