Hi Gyan, Thanks for the comments. I will follow the recommendation w.r.t MCLAG. About the problem statement, the solution that you are describing is not what this draft is about. There is actually no BGP session between host and DAG(IP). That problem is solved in a different draft. Here, connected hosts are simple appliance NOT running BGP where they are connected via L2/Ethernet to FHR (first hop router). Appliance can be a CE, a switch, etc. The connected interfaces are L3 instead of usual L2 interface. The draft explains how to sync ARP/ND table and few more things. Regards, Patrice Brissette Distinguished Engineer Cisco Systems
From: Gyan Mishra <[email protected]> Date: Saturday, September 6, 2025 at 23:35 To: Patrice Brissette (pbrisset) <[email protected]> Cc: [email protected] <[email protected]>, [email protected] <[email protected]> Subject: Re: [bess] draft-mackenzie-bess-evpn-l3mh-proto Hi Patrice& authors Excellent work on coming up with this solution for L3 over L2 MC-LAG. I am curious about the use cases and problem this solution solves. When I think of MC-LAG I think of proprietary legacy implementations of multi chassis LAG such as Cisco vPC or Juniper MC-LAG where in contrast modern LAG using EVPN fabric for ARP/ND synchronization which does not require ICCP or proprietary link between the leafs for synchronization. My recommendation would be to not mention MC-LAG by itself and call it ESI MC-LAG which is the modern EVPN fabric based LAG used with MPLS or VXLAN fabrics. I have some questions regarding 1.1 in the problem statement. My understanding AFAIK that BGP over ESI LAG is very common in modern VXLAN or MPLS fabric based DC where the host is eBGP peered to the RFC 9135 Inter subnet forwarding Distributed Anycast Gateway (DAG) IP single session hashed to DF leaf and synchronized with NDF leaf via EVPN fabric. Here is how it works. Below describes how an all-active multihomed host interacts with an Ethernet VPN (EVPN) fabric using an anycast gateway and BGP . The mechanism ensures redundancy, seamless failover, and load balancing for both L2 and L3 traffic. Here is a breakdown of the process explained in the user's text: 1. Anycast gateway and host peering · Anycast IP: A host is typically connected to two or more leaf switches via a Link Aggregation Group (LAG). All leaf switches connected to the same Ethernet Segment Identifier (ESI) share the same IP and MAC address, called the anycast gateway. · o eBGP peering: The multihomed host establishes an External BGP (eBGP) peering session with the anycast gateway IP address. Since the IP is the same on both leaf switches, the host sees a single gateway. o Designated Forwarder (DF) election: EVPN uses a DF election algorithm to determine which leaf switch is the DF for a specific Ethernet segment. The DF is responsible for forwarding Broadcast, Unknown-unicast, and Multicast (BUM) traffic to the host. The other leaf is the non-DF (NDF). o BGP session via DF: The host's eBGP session will be established over the LAG member connected to the DF leaf. This is because the DF holds the active ARP/ND entry for the host. 2. Seamless failover · ARP/ND synchronization: EVPN synchronizes the host's ARP (for IPv4) and ND (for IPv6) information across the fabric using EVPN Type-2 routes (MAC/IP Advertisement routes). This means the NDF leaf is also aware of the host's IP and MAC address. · Fabric notification and failover: If the DF leaf switch fails, the eBGP session drops. The NDF leaf, having already been synchronized with the host's reachability information, takes over as the new DF. This provides a seamless failover, as the host's BGP peering is quickly re-established with the new DF leaf using the same anycast gateway IP address. 3. Traffic flow management · Load balancing for host-advertised subnets:When the multihomed host advertises subnets via BGP into the EVPN fabric, the fabric sees the routes originating from both the DF and NDF leaf switches (with the same ESI). This allows the fabric to use Equal-Cost Multipath (ECMP) routing to load balance incoming traffic flows across both all-active links. · EVPN procedures for loop prevention: o Split horizon: This mechanism prevents a BUM packet from being forwarded back to the multihomed host it originated from. For VXLAN, this is typically done using the source IP address of the VTEP (the leaf switch) in the tunnel header to prevent the packet from looping back. o Local bias: With local bias, when a leaf switch receives BUM traffic from a remote VTEP that is also part of a shared Ethernet segment, it will not forward that traffic out of its local port for that segment. This is the main VXLAN-based mechanism for split horizon filtering. o Backup path aliasing (anycast aliasing): This is an optimization that helps remote leaf switches load balance traffic toward a multihomed site. It allows load balancing across all leaf switches attached to the same ESI, ensuring efficient use of all paths. Thanks Gyan On Fri, Sep 5, 2025 at 4:55 PM Patrice Brissette (pbrisset) <[email protected]<mailto:[email protected]>> wrote: Hi, We believe this draft is ready for WG adoption. How can we move it forward? Draft is here: https://datatracker.ietf.org/doc/draft-mackenzie-bess-evpn-l3mh-proto/ Regards, Patrice Brissette Distinguished Engineer Cisco Systems _______________________________________________ BESS mailing list -- [email protected]<mailto:[email protected]> To unsubscribe send an email to [email protected]<mailto:[email protected]>
_______________________________________________ BESS mailing list -- [email protected] To unsubscribe send an email to [email protected]
