Hi Kevin,

I am sceptical if the proposed BGP extension is desired in BGP protocol.
But this is just my own opinion and as Jeff says I can be in "rough" on it.

But reading on your proposal I do think that marking this coloring on a per
BGP session basis (strict or loose) is a very bad idea. We have departed
from any per session marking when MP BGP Extensions have been introduced.
So if you want to continue I recommend a much more granular capability of
coloring. Ideally on a per NLRI/UPDATE MSG basis.

With your current proposal you have created a physical partitioning not
logical one.

Also can you elaborate in your draft (keeping in mind BGP native
recursiveness) why BGP CAR or BGP CT proposals fail to address your
objectives ? Are they broken and need fixing or you just prefer to start
fresh with yet one more way to achieve the same ?

Thank you,
R.



On Thu, Dec 4, 2025 at 10:18 PM Wang, Kevin <[email protected]> wrote:

> Hi Robert,
>
> Unless we have perfect load balancing, congestion is always possible, even
> in a non-blocking Clos fabric. Also, there are other scenarios where
> avoiding fate-sharing paths is crucial.
>
> Thanks,
> Kevin
>
> *From: *Robert Raszuk <[email protected]>
> *Date: *Wednesday, December 3, 2025 at 3:59 PM
> *To: *Wang, Kevin <[email protected]>
> *Cc: *Gyan Mishra <[email protected]>, idr@ietf. org <[email protected]>,
> lsr <[email protected]>
> *Subject: *Re: [Idr] Re: Fwd: I-D Action: draft-wang-idr-dpf-00.txt
>
> Hi Kevin,
>
> Your draft explains how to do poor man's flex algo in BGP - ok.
>
> But could you elaborate why anyone would do that (and push more
> complexity) in a non-blocking CLOS fabric ?
>
> Cheers,
> R.
>
>
>
> On Wed, Dec 3, 2025 at 7:35 PM Wang, Kevin <[email protected]> wrote:
>
> Hi Robert,
>
> Thank you for providing further details about your thoughts. What I heard
> that IGP was not initially adopted in DC fabrics was due to its scaling
> issues (mostly due to lsdb flooding), especially for the hyperscalers. I
> understand that there were efforts later trying to address the scaling
> issues from IGP side. I see your experience of using ISIS to successfully
> construct the fabric as a good example. Yes, it might be worth to write an
> ISIS for DC fabrics informational RFC, serving as an alternative to RFC
> 7938. There are also other efforts trying to bring traffic engineering
> technologies, such as RSVP, MPTE, etc to the DC fabrics. Like any other
> networks, the DC fabrics will probably also evolve over time.
>
> Having said that, most of today’s DC fabrics (at least for those DC
> customers I have dealt with) are designed following RFC 7938:
>
>    - Use Clos topology
>    - Use IP forwarding
>    - Use EBGP as the underlay routing protocol
>
> I guess the choices above are for technical reasons as well as business
> reasons. BGP DPF is developed under the assumptions/observations above. I
> agree that the DC fabrics might evolve and adopt other technologies such as
> IGP, RSVP, in the future. For the time being and the foreseeable future,
> BGP DPF would help to provide a lightweight traffic engineering for the DC
> fabrics.
>
> Thanks,
> Kevin
>
> *From: *Robert Raszuk <[email protected]>
> *Date: *Tuesday, December 2, 2025 at 2:46 PM
> *To: *Wang, Kevin <[email protected]>
> *Cc: *Gyan Mishra <[email protected]>, idr@ietf. org <[email protected]>,
> lsr <[email protected]>
> *Subject: *Re: [Idr] Re: Fwd: I-D Action: draft-wang-idr-dpf-00.txt
>
> Dear Kevin,
>
> I know very well what RFC 7938 says. In fact I did review this document
> well before it became an RFC :)
>
> But what happened next is that while RFC7938 make a valid observation on
> how one can build MSDCs lots of folks misinterpreted it as the only guide
> on how to build even a few racks of DC fabrics.
>
> So yes, using BGP to construct dynamic routing in the DC fabrics has its
> use cases that are really applicable to only a handful of deployments. And
> I am not aware that any of the MSDCs would be asking you for logical
> transport planes within their fabrics.
>
> All other DCs would be much better off using IGP for underlay and BGP for
> overlay as a design pattern.
>
> When I constructed 10 full racks of hardware using ISIS folks were shocked
> - and pointed out that I am not using an IETF standard approach :). Then
> when I demonstrated that connectivity restoration upon any node or link
> failure is repaired in less then 50 ms the masks went off.
>
> Maybe what is actually needed is an  informational RFC - just like RFC7938
> - simply illustrating that one can construct DC using ISIS. It is obvious
> to me, but I admit there is no RFC I am aware of to show operators that
> "Large-Scale Data Centers" can be robustly build with IGPs.
>
> Kind regards,
> Robert
>
>
> On Tue, Dec 2, 2025 at 7:24 PM Wang, Kevin <[email protected]> wrote:
>
> Hi Robert and Gyan,
>
> Thanks for your feedback! Your observation is correct that IGP Flex Algo
> could achieve the same. BGP DPF can be though as a BGP counterpart of IGP
> Flex Algo to some extent (though not precisely).
>
> As explained in the “Introduction” section of this draft, BGP DPF is
> designed for the current IP fabric environment where EBGP is usually the
> only protocol used for routing. Section 5 of RFC 7938 explains why DC
> fabrics use EBGP as the sole routing protocol.
>
> Thanks,
> Kevin
>
> *From: *Gyan Mishra <[email protected]>
> *Date: *Tuesday, December 2, 2025 at 7:43 AM
> *To: *Robert Raszuk <[email protected]>
> *Cc: *idr@ietf. org <[email protected]>, lsr <[email protected]>
> *Subject: *[Idr] Re: Fwd: I-D Action: draft-wang-idr-dpf-00.txt
>
> I agree with Robert that you could use RFC 9502 IGP Flex Algo in IP
> networks to build disjoint planes as desired.
>
> You could also use SRv6 with IGP Flex Algo with SR RFC 9350 which uses
> IPv6 data plane and build your disjoint planes.
>
> Thanks
>
> Gyan
>
> On Tue, Dec 2, 2025 at 6:32 AM Robert Raszuk <[email protected]> wrote:
>
> Hi,
>
> In respect to the subject draft ... why would you not use IGP Flexible
> Algorithm for it ?
>
> Are you going to port now years of work from IGP to BGP to achieve the
> same ?
>
> Besides, in a non-blocking fabric latency is really not a factor. So you
> want to logically partition it to make it blocking them worry about what
> travels on which such logical plane ? Is this a reasonable direction ?
>
> Thx,
> R.
>
> ---------- Forwarded message ---------
> From: <[email protected]>
> Date: Mon, Dec 1, 2025 at 10:49 PM
> Subject: I-D Action: draft-wang-idr-dpf-00.txt
> To: <[email protected]>
>
>
> Internet-Draft draft-wang-idr-dpf-00.txt is now available.
>
>    Title:   BGP Deterministic Path Forwarding (DPF)
>    Authors: Kevin Wang
>             Michal Styszynski
>             Wen Lin
>             Mahesh Subramaniam
>             Thomas Kampa
>             Diptanshu Singh
>    Name:    draft-wang-idr-dpf-00.txt
>    Pages:   18
>    Dates:   2025-12-01
>
> Abstract:
>
>    Modern data center (DC) fabrics typically employ Clos topologies with
>    External BGP (EBGP) for plain IPv4/IPv6 routing.  While hop-by-hop
>    EBGP routing is simple and scalable, it provides only a single best-
>    effort forwarding service for all types of traffic.  This single
>    best-effort service might be insufficient for increasingly diverse
>    traffic requirements in modern DC environments.  For example, loss
>    and latency sensitive AI/ML flows may demand stronger Service Level
>    Agreements (SLA) than general purpose traffic.  Duplication schemes
>    which are standardized through protocols such as Parallel Redundancy
>    Protocol (PRP) require disjoint forwarding paths to avoid single
>    points of failure.  Congestion avoidance may require more
>    deterministic forwarding behavior.
>
>    This document introduces BGP Deterministic Path Forwarding (DPF), a
>    mechanism that partitions the physical fabric into multiple logical
>    fabrics.  Flows can be mapped to different logical fabrics based on
>    their specific requirements, enabling deterministic forwarding
>    behavior within the data center.
>
> The IETF datatracker status page for this Internet-Draft is:
> https://datatracker.ietf.org/doc/draft-wang-idr-dpf/
> <https://urldefense.com/v3/__https://datatracker.ietf.org/doc/draft-wang-idr-dpf/__;!!NEt6yMaO-gk!EP_lEYmqbOUApQqqOz-ZuP9CsojS2gbvLvgQfxoYTXPXtS-0yjfv8ElqZwJBCRfOLFY6nymWoR5eJlshPeG9$>
>
> There is also an HTML version available at:
> https://www.ietf.org/archive/id/draft-wang-idr-dpf-00.html
> <https://urldefense.com/v3/__https://www.ietf.org/archive/id/draft-wang-idr-dpf-00.html__;!!NEt6yMaO-gk!EP_lEYmqbOUApQqqOz-ZuP9CsojS2gbvLvgQfxoYTXPXtS-0yjfv8ElqZwJBCRfOLFY6nymWoR5eJjgsy_TY$>
>
> Internet-Drafts are also available by rsync at:
> rsync.ietf.org::internet-drafts
>
>
> _______________________________________________
> I-D-Announce mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> _______________________________________________
> Idr mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
>
>
_______________________________________________
Lsr mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to