[Lsr] Re: [Idr] I-D Action: draft-wang-idr-dpf-00.txt

Robert Raszuk Fri, 05 Dec 2025 08:22:32 -0800

Thank you Jeff for your good comments.

Just want to respond to one point made:




*> With your current proposal you have created a physical partitioning not
logical one.I agree with that assessment. For the case of BGP fabrics, and
likely similar overlapping cases for IGP signaled fabrics, this is likely
the desired property. *

I think history taught us that hard partitioning the IP fabric is to say it
softly suboptimal choice of any solution.

And with IGPs you always run your base topology across all links and nodes.

Thx,
R.




On Fri, Dec 5, 2025 at 4:50 PM Jeffrey Haas <[email protected]> wrote:

> [Speaking as an individual contributor. Note, I'm not an author here.]
>
> Robert,
>
>
> > On Dec 4, 2025, at 5:17 PM, Robert Raszuk <[email protected]> wrote:
> >
> > I am sceptical if the proposed BGP extension is desired in BGP protocol.
> But this is just my own opinion and as Jeff says I can be in "rough" on it.
>
> For you to be in the rough, there would have be enough other opinions on
> the proposal to compare yours to.  It's too early for there to be a rough.
> :-)
>
> > But reading on your proposal I do think that marking this coloring on a
> per BGP session basis (strict or loose) is a very bad idea. We have
> departed from any per session marking when MP BGP Extensions have been
> introduced. So if you want to continue I recommend a much more granular
> capability of coloring. Ideally on a per NLRI/UPDATE MSG basis.
>
> We have a few mechanisms that take advantage of per-session
> characteristics.  A deployed one is RFC 9234 for marking session type.  A
> non-deployed one covers some of the filtering classes covered in wide
> communities.
>
> Role-based filtering is probably good generic plumbing.  It does overlap
> with some of our prior ORF cases as well.
>
> For at least the proposed use case, the reason why ORF likely isn't a good
> fit is a desire to simply not bring up the session in some cases when
> there's a role mismatch.
>
> > With your current proposal you have created a physical partitioning not
> logical one.
>
> I agree with that assessment. For the case of BGP fabrics, and likely
> similar overlapping cases for IGP signaled fabrics, this is likely the
> desired property.
>
> The general discussion happening several places in the IETF largely
> covering "AI fabrics" is focusing itself roughly around two large
> properties:  Congestion is bad and we need to avoid it.  How do we
> partition resources and schedule traffic on those resources to avoid
> congestion.[1]
>
> It's pretty obvious that topological separation is one way to avoid
> certain classes of congestion in conjunction with selectively directing
> traffic that will pass over those links.  This is one of the old schools
> for doing such things.  Soft or hard partitioning of such things (e.g. RSVP
> and policers) is another such mechanism, but that's not what's proposed
> here.
>
> > Also can you elaborate in your draft (keeping in mind BGP native
> recursiveness) why BGP CAR or BGP CT proposals fail to address your
> objectives ? Are they broken and need fixing or you just prefer to start
> fresh with yet one more way to achieve the same ?
>
> I'll let Kevin and his coauthors cover their particular justifications.
> Here's a few observations from prior IDR list discussion that are probably
> applicable:
>
> The dialog around "routes with color" did indeed touch how some links
> might be provisioned with specific colors.  Applying a color to the BGP
> session was a discussed item.  Such a mechanism was determined for the use
> cases covered to not be a requirement and policy was discussed as being
> generally adequate to permit a link that only wanted to originate or
> receive routes with specific colors to do so.  One might observe that this
> mechanism would work as a easy-mode filter for any of our three "routes
> with color" mechanisms.
>
> The forwarding plane considerations will drive whatever multi-topology
> mechanisms you're able to deploy.  In the absence of something that can
> take a color association and permit the same destination to be forwarded
> distinctly based on different signaling for that destination in a color, it
> becomes necessary for the operator to take care to provision things
> disjointly.  Otherwise, it's necessary to pick your mechanism so that
> things can cross a pinch-point in overlapping topologies.  These things are
> deployment and operational choices, especially since it usually means
> you're adding to our OSI layer cake when it's not "just IP".[2]  (These
> things are not new considerations.)
>
> CPR addressed this by noting that keeping your address ranges distinct for
> your forwarding behaviors addresses the problem.
>
> It could be observed that if you're sticking with boring boxes doing RFC
> 7938, careful choice of your address range and building constrained
> portions of your topology to avoid congestion addresses the use case.  This
> could be done with nothing but policy.  Things that permit easy-mode reduce
> provisioning complexity (even if machine driven) are helpful.
>
> Adding color to the routes themselves could certainly be done, and
> deployed in conjunction with all of the above. However, this requires
> addressing how forwarding will work at the pinch points if normal IP, or
> what forwarding is like if it's not normal IP.  Perhaps SR-*?  Certainly
> deploying a route with color mechanism is a way to have paths with
> different congestion properties, but that still begs the provisioning (and
> thus policy filtering) conversation.
>
> The question for "why not routes with color" is thus more "is the
> forwarding looking to intermix traffic for the same IP destination with
> different forwarding characteristics" along with "if yes, what forwarding
> paradigm do we want in fabric topologies"?
>
> Routes with color (CT/CAR varieties) could certainly be used.  I don't
> know that it makes things better.
>
> -- Jeff
>
> [1] The usual reference to RFC 1925 is appropriate.  This round it's not
> RSVP and ATM. Luckily enough of the folk who dealt with similar
> considerations the prior rounds are still active to help steer the
> conversation out of some of the prior dead ends.
>
> [2] There are more than a few older gripes covering BGP fabrics that "if
> we were running MPLS, much of this gets easier".  Insert similar gripes for
> your favorite encapsulation.  "Keep it simple" as "reduce the cost of the
> platform" starts becoming a perverse behavior once you start contorting
> around the "simple" use cases.
>
> >
> >
> >
> > On Thu, Dec 4, 2025 at 10:18 PM Wang, Kevin <[email protected]> wrote:
> > Hi Robert,
> >
> > Unless we have perfect load balancing, congestion is always possible,
> even in a non-blocking Clos fabric. Also, there are other scenarios where
> avoiding fate-sharing paths is crucial.
> >
> > Thanks,
> > Kevin
> >
> > From: Robert Raszuk <[email protected]>
> > Date: Wednesday, December 3, 2025 at 3:59 PM
> > To: Wang, Kevin <[email protected]>
> > Cc: Gyan Mishra <[email protected]>, idr@ietf. org <[email protected]>,
> lsr <[email protected]>
> > Subject: Re: [Idr] Re: Fwd: I-D Action: draft-wang-idr-dpf-00.txt
> >
> > Hi Kevin,
> >
> > Your draft explains how to do poor man's flex algo in BGP - ok.
> >
> > But could you elaborate why anyone would do that (and push more
> complexity) in a non-blocking CLOS fabric ?
> >
> > Cheers,
> > R.
> >
> >
> >
> > On Wed, Dec 3, 2025 at 7:35 PM Wang, Kevin <[email protected]> wrote:
> > Hi Robert,
> >
> > Thank you for providing further details about your thoughts. What I
> heard that IGP was not initially adopted in DC fabrics was due to its
> scaling issues (mostly due to lsdb flooding), especially for the
> hyperscalers. I understand that there were efforts later trying to address
> the scaling issues from IGP side. I see your experience of using ISIS to
> successfully construct the fabric as a good example. Yes, it might be worth
> to write an ISIS for DC fabrics informational RFC, serving as an
> alternative to RFC 7938. There are also other efforts trying to bring
> traffic engineering technologies, such as RSVP, MPTE, etc to the DC
> fabrics. Like any other networks, the DC fabrics will probably also evolve
> over time.
> >
> > Having said that, most of today’s DC fabrics (at least for those DC
> customers I have dealt with) are designed following RFC 7938:
> >       • Use Clos topology
> >       • Use IP forwarding
> >       • Use EBGP as the underlay routing protocol
> > I guess the choices above are for technical reasons as well as business
> reasons. BGP DPF is developed under the assumptions/observations above. I
> agree that the DC fabrics might evolve and adopt other technologies such as
> IGP, RSVP, in the future. For the time being and the foreseeable future,
> BGP DPF would help to provide a lightweight traffic engineering for the DC
> fabrics.
> >
> > Thanks,
> > Kevin
> >
> > From: Robert Raszuk <[email protected]>
> > Date: Tuesday, December 2, 2025 at 2:46 PM
> > To: Wang, Kevin <[email protected]>
> > Cc: Gyan Mishra <[email protected]>, idr@ietf. org <[email protected]>,
> lsr <[email protected]>
> > Subject: Re: [Idr] Re: Fwd: I-D Action: draft-wang-idr-dpf-00.txt
> >
> > Dear Kevin,
> >
> > I know very well what RFC 7938 says. In fact I did review this document
> well before it became an RFC :)
> >
> > But what happened next is that while RFC7938 make a valid observation on
> how one can build MSDCs lots of folks misinterpreted it as the only guide
> on how to build even a few racks of DC fabrics.
> >
> > So yes, using BGP to construct dynamic routing in the DC fabrics has its
> use cases that are really applicable to only a handful of deployments. And
> I am not aware that any of the MSDCs would be asking you for logical
> transport planes within their fabrics.
> >
> > All other DCs would be much better off using IGP for underlay and BGP
> for overlay as a design pattern.
> >
> > When I constructed 10 full racks of hardware using ISIS folks were
> shocked - and pointed out that I am not using an IETF standard approach :).
> Then when I demonstrated that connectivity restoration upon any node or
> link failure is repaired in less then 50 ms the masks went off.
> >
> > Maybe what is actually needed is an  informational RFC - just like
> RFC7938 - simply illustrating that one can construct DC using ISIS. It is
> obvious to me, but I admit there is no RFC I am aware of to show operators
> that "Large-Scale Data Centers" can be robustly build with IGPs.
> >
> > Kind regards,
> > Robert
> >
> >
> > On Tue, Dec 2, 2025 at 7:24 PM Wang, Kevin <[email protected]> wrote:
> > Hi Robert and Gyan,
> >
> > Thanks for your feedback! Your observation is correct that IGP Flex Algo
> could achieve the same. BGP DPF can be though as a BGP counterpart of IGP
> Flex Algo to some extent (though not precisely).
> >
> > As explained in the “Introduction” section of this draft, BGP DPF is
> designed for the current IP fabric environment where EBGP is usually the
> only protocol used for routing. Section 5 of RFC 7938 explains why DC
> fabrics use EBGP as the sole routing protocol.
> >
> > Thanks,
> > Kevin
> >
> > From: Gyan Mishra <[email protected]>
> > Date: Tuesday, December 2, 2025 at 7:43 AM
> > To: Robert Raszuk <[email protected]>
> > Cc: idr@ietf. org <[email protected]>, lsr <[email protected]>
> > Subject: [Idr] Re: Fwd: I-D Action: draft-wang-idr-dpf-00.txt
> >
> > I agree with Robert that you could use RFC 9502 IGP Flex Algo in IP
> networks to build disjoint planes as desired.
> >
> > You could also use SRv6 with IGP Flex Algo with SR RFC 9350 which uses
> IPv6 data plane and build your disjoint planes.
> >
> > Thanks
> >
> > Gyan
> >
> > On Tue, Dec 2, 2025 at 6:32 AM Robert Raszuk <[email protected]> wrote:
> > Hi,
> >
> > In respect to the subject draft ... why would you not use IGP Flexible
> Algorithm for it ?
> >
> > Are you going to port now years of work from IGP to BGP to achieve the
> same ?
> >
> > Besides, in a non-blocking fabric latency is really not a factor. So you
> want to logically partition it to make it blocking them worry about what
> travels on which such logical plane ? Is this a reasonable direction ?
> >
> > Thx,
> > R.
> >
> > ---------- Forwarded message ---------
> > From: <[email protected]>
> > Date: Mon, Dec 1, 2025 at 10:49 PM
> > Subject: I-D Action: draft-wang-idr-dpf-00.txt
> > To: <[email protected]>
> >
> >
> > Internet-Draft draft-wang-idr-dpf-00.txt is now available.
> >
> >    Title:   BGP Deterministic Path Forwarding (DPF)
> >    Authors: Kevin Wang
> >             Michal Styszynski
> >             Wen Lin
> >             Mahesh Subramaniam
> >             Thomas Kampa
> >             Diptanshu Singh
> >    Name:    draft-wang-idr-dpf-00.txt
> >    Pages:   18
> >    Dates:   2025-12-01
> >
> > Abstract:
> >
> >    Modern data center (DC) fabrics typically employ Clos topologies with
> >    External BGP (EBGP) for plain IPv4/IPv6 routing.  While hop-by-hop
> >    EBGP routing is simple and scalable, it provides only a single best-
> >    effort forwarding service for all types of traffic.  This single
> >    best-effort service might be insufficient for increasingly diverse
> >    traffic requirements in modern DC environments.  For example, loss
> >    and latency sensitive AI/ML flows may demand stronger Service Level
> >    Agreements (SLA) than general purpose traffic.  Duplication schemes
> >    which are standardized through protocols such as Parallel Redundancy
> >    Protocol (PRP) require disjoint forwarding paths to avoid single
> >    points of failure.  Congestion avoidance may require more
> >    deterministic forwarding behavior.
> >
> >    This document introduces BGP Deterministic Path Forwarding (DPF), a
> >    mechanism that partitions the physical fabric into multiple logical
> >    fabrics.  Flows can be mapped to different logical fabrics based on
> >    their specific requirements, enabling deterministic forwarding
> >    behavior within the data center.
> >
> > The IETF datatracker status page for this Internet-Draft is:
> > https://datatracker.ietf.org/doc/draft-wang-idr-dpf/
> >
> > There is also an HTML version available at:
> > https://www.ietf.org/archive/id/draft-wang-idr-dpf-00.html
> >
> > Internet-Drafts are also available by rsync at:
> > rsync.ietf.org::internet-drafts
> >
> >
> > _______________________________________________
> > I-D-Announce mailing list -- [email protected]
> > To unsubscribe send an email to [email protected]
> > _______________________________________________
> > Idr mailing list -- [email protected]
> > To unsubscribe send an email to [email protected]
> > _______________________________________________
> > Idr mailing list -- [email protected]
> > To unsubscribe send an email to [email protected]
>
>

_______________________________________________
Lsr mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[Lsr] Re: [Idr] I-D Action: draft-wang-idr-dpf-00.txt

Reply via email to