Thanks for the preliminary -05 revision. It answers a lot of my questions.
However, now that I better understand the "overlay index" concept, I
have gone a bit deeper into the details of your use cases, and have some
comments on them in-line in the attached document.
Probably the biggest issue is that I'm not sure it is quite clear
precisely how you tell whether an overlay index is present in an RT-5,
or precisely how you determine which kind of overlay index is present.
Thanks for the preliminary -05 revision. It answers a lot of my questions.
However, now that I better understand the "overlay index" concept, I have
gone a bit deeper into the details of your use cases, and have some comments
on them in-line in the attached document.
Probably the biggest issue is that I'm not sure it is quite clear precisely
how you tell whether an overlay index is present in an RT-5, or precisely
how you determine which kind of overlay index is present.
BESS Workgroup J. Rabadan, Ed.
Internet Draft W. Henderickx
Intended status: Standards Track Nokia
J. Drake
W. Lin
Juniper
A. Sajassi
Cisco
Expires: September 23, 2017 March 22, 2017
IP Prefix Advertisement in EVPN
draft-ietf-bess-evpn-prefix-advertisement-05
Abstract
EVPN provides a flexible control plane that allows intra-subnet
connectivity in an IP/MPLS and/or an NVO-based network. In some
networks, there is also a need for a dynamic and efficient inter-
subnet connectivity across Tenant Systems and End Devices that can be
physical or virtual and do not necessarily participate in dynamic
routing protocols. This document defines a new EVPN route type for
the advertisement of IP Prefixes and explains some use-case examples
where this new route-type is used.
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
Rabadan et al. Expires September 23, 2017 [Page 1]
Internet-Draft EVPN Prefix Advertisement March 22, 2017
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
This Internet-Draft will expire on September 22, 2017.
Copyright Notice
Copyright (c) 2017 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Introduction and problem statement . . . . . . . . . . . . . . 3
2.1 Inter-subnet connectivity requirements in Data Centers . . . 4
2.2 The requirement for a new EVPN route type . . . . . . . . . 6
3. The BGP EVPN IP Prefix route . . . . . . . . . . . . . . . . . 7
3.1 IP Prefix Route encoding . . . . . . . . . . . . . . . . . . 8
3.2 Overlay Indexes and Recursive Lookup Resolution . . . . . . 10
4. IP Prefix Overlay Index use-cases . . . . . . . . . . . . . . . 11
4.1 TS IP address Overlay Index use-case . . . . . . . . . . . . 11
4.2 Floating IP Overlay Index use-case . . . . . . . . . . . . . 14
4.3 Bump-in-the-wire use-case . . . . . . . . . . . . . . . . . 16
4.4 IP-VRF-to-IP-VRF model . . . . . . . . . . . . . . . . . . . 18
4.4.1 Interface-less IP-VRF-to-IP-VRF model . . . . . . . . . 19
4.4.2 Interface-full IP-VRF-to-IP-VRF with core-facing IRB . . 22
4.4.3 Interface-full IP-VRF-to-IP-VRF with unnumbered
core-facing IRB . . . . . . . . . . . . . . . . . . . . 25
5. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6. Conventions used in this document . . . . . . . . . . . . . . . 29
7. Security Considerations . . . . . . . . . . . . . . . . . . . . 29
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 29
9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 29
9.1 Normative References . . . . . . . . . . . . . . . . . . . . 29
9.2 Informative References . . . . . . . . . . . . . . . . . . . 30
10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 30
11. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 30
12. Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 30
Rabadan et al. Expires September 23, 2017 [Page 2]
Internet-Draft EVPN Prefix Advertisement March 22, 2017
1. Terminology
GW IP: Gateway IP Address
IPL: IP address length
IRB: Integrated Routing and Bridging interface
**** Nit: In the document, IRB is sometimes used to mean "Integrated Routing
**** and Bridging interface" and sometimes to mean "Integrated Routing and
**** Bridging".
ML: MAC address length
NVE: Network Virtualization Edge
TS: Tenant System
VA: Virtual Appliance
RT-2: EVPN route type 2, i.e. MAC/IP advertisement route
RT-5: EVPN route type 5, i.e. IP Prefix route
AC: Attachment Circuit
Ethernet NVO tunnel: it refers to Network Virtualization Overlay
tunnels with Ethernet payload. Examples of this type of tunnels are
VXLAN or nvGRE.
IP NVO tunnel: it refers to Network Virtualization Overlay tunnels
with IP payload (no MAC header in the payload).
MAC-VRF: A Virtual Routing and Forwarding table for Media Access
Control (MAC) addresses on an NVE/PE, as per [RFC7432].
IP-VRF: A VPN Routing and Forwarding tables for IP addresses on an
NVE/PE, similar to the VRF concept defined in [RFC4364], however, in
this document, the IP routes are always populated by the EVPN address
family.
2. Introduction and problem statement
Inter-subnet connectivity is required for certain tenants within the
Data Center. [EVPN-INTERSUBNET] defines some fairly common inter-
subnet forwarding scenarios where TSes can exchange packets with TSes
located in remote subnets. In order to meet this requirement,
[EVPN-INTERSUBNET] describes how MAC/IPs encoded in TS RT-2 routes
are not only used to populate MAC-VRF and overlay ARP tables, but
also IP-VRF tables with the encoded TS host routes (/32 or /128). In
some cases, EVPN may advertise IP Prefixes and therefore provide
aggregation in the IP-VRF tables, as opposed to program individual
Rabadan et al. Expires September 23, 2017 [Page 3]
Internet-Draft EVPN Prefix Advertisement March 22, 2017
host routes. This document complements the scenarios described in
[EVPN-INTERSUBNET] and defines how EVPN may be used to advertise IP
Prefixes. Interoperability between EVPN and L3VPN [RFC4364] IP Prefix
routes is out of the scope of this document.
Section 2.1 describes the inter-subnet connectivity requirements in
Data Centers. Section 2.2 explains why a new EVPN route type is
required for IP Prefix advertisements. Once the need for a new EVPN
**** Nit: if you say "used" instead of "required", you'll avoid arguments
**** with annoying reviewers who say "it's not really required, you could
**** have done it another way" ;-)
route type is justified, sections 3, 4 and 5 will describe this route
type and how it is used in some specific use cases.
2.1 Inter-subnet connectivity requirements in Data Centers
[RFC7432] is used as the control plane for a Network Virtualization
Overlay (NVO3) solution in Data Centers (DC), where Network
Virtualization Edge (NVE) devices can be located in Hypervisors or
TORs, as described in [EVPN-OVERLAY].
If we use the term Tenant System (TS) to designate a physical or
virtual system identified by MAC and IP addresses, and connected to a
MAC-VRF by an Attachment Circuit, the following considerations apply:
o The Tenant Systems may be Virtual Machines (VMs) that generate
traffic from their own MAC and IP.
o The Tenant Systems may be Virtual Appliance entities (VAs) that
forward traffic to/from IP addresses of different End Devices
sitting behind them.
o These VAs can be firewalls, load balancers, NAT devices, other
appliances or virtual gateways with virtual routing instances.
o These VAs do not necessarily participate in dynamic routing
protocols and hence rely on the EVPN NVEs to advertise the
routes on their behalf.
o In all these cases, the VA will forward traffic to other TSes
using its own source MAC but the source IP will be the one
associated to the End Device sitting behind or a translated IP
address (part of a public NAT pool) if the VA is performing
NAT.
o Note that the same IP address could exist behind two of these
TS. One example of this would be certain appliance resiliency
mechanisms, where a virtual IP or floating IP can be owned by
one of the two VAs running the resiliency protocol (the master
VA). VRRP is one particular example of this. Another example
**** Nit: At least four ADs and the RFC Editor will point out that "VRRP" is
**** not expanded at first occurrence, nor is a reference for it given ;-)
is multi-homed subnets, i.e. the same subnet is connected to
Rabadan et al. Expires September 23, 2017 [Page 4]
Internet-Draft EVPN Prefix Advertisement March 22, 2017
two VAs.
o Although these VAs provide IP connectivity to VMs and subnets
behind them, they do not always have their own IP interface
connected to the EVPN NVE, e.g. layer-2 firewalls are examples
of VAs not supporting IP interfaces.
The Figure 1 illustrates some of the examples described above.
NVE1
+-----------+
TS1(VM)--|(MAC-VRF10)|-----+
IP1/M1 +-----------+ | DGW1
+---------+ +-------------+
| |----|(MAC-VRF10) |
SN1---+ NVE2 | | | IRB1\ |
| +-----------+ | | | (IP-VRF)|---+
SN2---TS2(VA)--|(MAC-VRF10)|-| | +-------------+ _|_
| IP2/M2 +-----------+ | VXLAN/ | ( )
IP4---+ <-+ | nvGRE | DGW2 ( WAN )
| | | +-------------+ (___)
vIP23 (floating) | |----|(MAC-VRF10) | |
| +---------+ | IRB2\ | |
SN1---+ <-+ NVE3 | | | | (IP-VRF)|---+
| IP3/M3 +-----------+ | | | +-------------+
SN3---TS3(VA)--|(MAC-VRF10)|---+ | |
| +-----------+ | |
IP5---+ | |
| |
NVE4 | | NVE5 +--SN5
+---------------------+ | | +-----------+ |
IP6------|(MAC-VRF1) | | +-|(MAC-VRF10)|--TS4(VA)--SN6
| \ | | +-----------+ |
| (IP-VRF) |--+ ESI4 +--SN7
| / \IRB3 |
|---|(MAC-VRF2)(MAC-VRF10)|
SN4| +---------------------+
Figure 1 DC inter-subnet use-cases
Where:
NVE1, NVE2, NVE3, NVE4, NVE5, DGW1 and DGW2 share the same EVI for a
particular tenant. EVI-10 is comprised of the collection of MAC-VRF10
instances defined in all the NVEs. All the hosts connected to EVI-10
belong to the same IP subnet. The hosts connected to EVI-10 are
listed below:
o TS1 is a VM that generates/receives traffic from/to IP1, where
Rabadan et al. Expires September 23, 2017 [Page 5]
Internet-Draft EVPN Prefix Advertisement March 22, 2017
IP1 belongs to the EVI-10 subnet.
o TS2 and TS3 are Virtual Appliances (VA) that generate/receive
traffic from/to the subnets and hosts sitting behind them
(SN1, SN2, SN3, IP4 and IP5). Their IP addresses (IP2 and IP3)
belong to the EVI-10 subnet and they can also generate/receive
traffic. When these VAs receive packets destined to their own
MAC addresses (M2 and M3) they will route the packets to the
proper subnet or host. These VAs do not support routing
protocols to advertise the subnets connected to them and can
move to a different server and NVE when the Cloud Management
System decides to do so. These VAs may also support redundancy
mechanisms for some subnets, similar to VRRP, where a floating
IP is owned by the master VA and only the master VA forwards
traffic to a given subnet. E.g.: vIP23 in figure 1 is a
floating IP that can be owned by TS2 or TS3 depending on who
the master is. Only the master will forward traffic to SN1.
o Integrated Routing and Bridging interfaces IRB1, IRB2 and IRB3
have their own IP addresses that belong to the EVI-10 subnet
too. These IRB interfaces connect the EVI-10 subnet to Virtual
Routing and Forwarding (IP-VRF) instances that can route the
traffic to other connected subnets for the same tenant (within
the DC or at the other end of the WAN).
o TS4 is a layer-2 VA that provides connectivity to subnets SN5,
SN6 and SN7, but does not have an IP address itself in the
EVI-10. TS4 is connected to a physical port on NVE5 assigned
to Ethernet Segment Identifier 4.
All the above DC use cases require inter-subnet forwarding and
therefore the individual host routes and subnets:
a) MUST be advertised from the NVEs (since VAs and VMs do not
participate in dynamic routing protocols) and
b) MAY be associated to an Overlay Index that can be a VA IP address,
a floating IP address or an ESI. An Overlay Index is a next-hop
that requires a recursive resolution and it is described in
section 3.2.
**** Section 3.2 seems to also allow the Overlay Index to be a MAC
**** address. That possibility should be mentioned here as well.
2.2 The requirement for a new EVPN route type
[RFC7432] defines a MAC/IP route (also referred as RT-2) where a MAC
address can be advertised together with an IP address length (IPL)
and IP address (IP). While a variable IPL might have been used to
indicate the presence of an IP prefix in a route type 2, there are
several specific use cases in which using this route type to deliver
Rabadan et al. Expires September 23, 2017 [Page 6]
Internet-Draft EVPN Prefix Advertisement March 22, 2017
IP Prefixes is not suitable.
One example of such use cases is the "floating IP" example described
in section 2.1. In this example we need to decouple the advertisement
of the prefixes from the advertisement of the floating IP (vIP23 in
Figure 1) and MAC associated to it, otherwise the solution gets
highly inefficient and does not scale.
E.g.: if we are advertising 1k prefixes from M2 (using RT-2) and the
floating IP owner changes from M2 to M3, we would need to withdraw 1k
routes from M2 and re-advertise 1k routes from M3. However if we use
a separate route type, we can advertise the 1k routes associated to
the floating IP address (vIP23) and only one RT-2 for advertising the
ownership of the floating IP, i.e. vIP23 and M2 in the route type 2.
When the floating IP owner changes from M2 to M3, a single RT-2
withdraw/update is required to indicate the change. The remote DGW
will not change any of the 1k prefixes associated to vIP23, but will
only update the ARP resolution entry for vIP23 (now pointing at M3).
Other reasons to decouple the IP Prefix advertisement from the MAC/IP
route are listed below:
o Clean identification, operation and troubleshooting of IP
Prefixes, independent of and not subject to the interpretation
of the IPL and the IP value. E.g.: a default IP route
0.0.0.0/0 must always be easily and clearly distinguished from
the absence of IP information.
o MAC address information must not be compared by BGP when
choosing which of several IP Prefix routes to install in a
given IP-VRF. If IP Prefixes were to be advertised using
MAC/IP routes, the MAC information would always be present and
part of the route key.
**** Perhaps begin the last sentence above with "In MAC/IP routes, the MAC
**** information is part of the NLRI, so if IP Prefixes were ..."
The following sections describe how EVPN is extended with a new route
type for the advertisement of IP prefixes and how this route is used
to address the current and future inter-subnet connectivity
requirements existing in the Data Center.
3. The BGP EVPN IP Prefix route
The current BGP EVPN NLRI as defined in [RFC7432] is shown below:
Rabadan et al. Expires September 23, 2017 [Page 7]
Internet-Draft EVPN Prefix Advertisement March 22, 2017
+-----------------------------------+
| Route Type (1 octet) |
+-----------------------------------+
| Length (1 octet) |
+-----------------------------------+
| Route Type specific (variable) |
+-----------------------------------+
Where the route type field can contain one of the following specific
values (refer to the IANA "EVPN Route Types registry):
+ 1 - Ethernet Auto-Discovery (A-D) route
+ 2 - MAC/IP advertisement route
+ 3 - Inclusive Multicast Route
+ 4 - Ethernet Segment Route
This document defines an additional route type that IANA has added to
the registry, and will be used for the advertisement of IP Prefixes:
+ 5 - IP Prefix Route
The support for this new route type is OPTIONAL.
Since this new route type is OPTIONAL, an implementation not
supporting it MUST ignore the route, based on the unknown route type
value, as specified by Section 5.4 in [RFC7606].
The detailed encoding of this route and associated procedures are
described in the following sections.
3.1 IP Prefix Route encoding
An IP Prefix advertisement route NLRI consists of the following
fields:
Rabadan et al. Expires September 23, 2017 [Page 8]
Internet-Draft EVPN Prefix Advertisement March 22, 2017
+---------------------------------------+
| RD (8 octets) |
+---------------------------------------+
|Ethernet Segment Identifier (10 octets)|
+---------------------------------------+
| Ethernet Tag ID (4 octets) |
+---------------------------------------+
| IP Prefix Length (1 octet) |
+---------------------------------------+
| IP Prefix (4 or 16 octets) |
+---------------------------------------+
| GW IP Address (4 or 16 octets) |
+---------------------------------------+
| MPLS Label (3 octets) |
+---------------------------------------+
Where:
o RD, Ethernet Tag ID and MPLS Label fields will be used as
defined in [RFC7432] and [EVPN-OVERLAY].
o The Ethernet Segment Identifier will be a non-zero 10-byte
identifier if the ESI is used as an overlay index (see the
definition of overlay index in section 3.2). It will be zero
otherwise.
o The IP Prefix Length can be set to a value between 0 and 32
(bits) for ipv4 and between 0 and 128 for ipv6, and specifies
the number of bits in the Prefix.
o The IP Prefix will be a 32 or 128-bit field (ipv4 or ipv6).
The size of this field does not depend on the value of the IP
Prefix Length field.
o The GW IP (Gateway IP Address) will be a 32 or 128-bit field
(ipv4 or ipv6), and will encode an overlay IP index for the IP
Prefixes. The GW IP field SHOULD be zero if it is not used as
an overlay index. Refer to section 3.2 for the definition and
use of the Overlay Index.
o The MPLS Label field is encoded as 3 octets, where the high-
order 20 bits contain the label value. The value SHOULD be
null (zero) when the IP Prefix route is used for a recursive
**** The reason I asked you to specify that "zero" means "null" is that RFCs
**** 3032 and 5036 use "3" to mean "implicit null" and RFC 3032 has multiple
**** encodings (0 and 2) for (different flavors of) "explicit null".
**** EVPN/MVPN tend to use zero to mean "no label value specified". Maybe
**** I'm the only one who still gets confused by all these different
**** encodings that are named "null" ;-)
lookup resolution. If the received MPLS Label value is not
null, the route MUST still be used for recursive lookup
resolution if the local policy instructs the ingress NVE to do
so.
**** So a null label value inhibits the recursive resolution unless local
**** policy says to do it anyway?? Is that really your intention, or is this
**** a retrofit for a buggy implementation ;-)
Rabadan et al. Expires September 23, 2017 [Page 9]
Internet-Draft EVPN Prefix Advertisement March 22, 2017
o The total route length will indicate the type of prefix (ipv4
or ipv6) and the type of GW IP address (ipv4 or ipv6). Note
that the IP Prefix + the GW IP should have a length of either
64 or 256 bits, but never 160 bits (ipv4 and ipv6 mixed values
are not allowed).
The Eth-Tag ID, IP Prefix Length and IP Prefix will be part of the
route key used by BGP to compare routes. The rest of the fields will
not be part of the route key.
**** As written, this text requires a Route Reflector to ignore the RD when
**** considering whether two routes are comparable. That's just not right.
**** You replied:
[JORGE] This is consistent with RFC7432, section 7, in which the RD is
assumed to be part of the route key but not mentioned. If we don't make
the description inconsistent, it may be confusing? I left the text as it is
for the time being.
**** You mean "if we don't make the description consistent", I think.
**** IMHO it's a rather bad practice for a specification to rely on things
**** that are assumed but not mentioned. I have noticed this mistake in RFC
**** 7432. It should not propagte into other documents. I think it would be
**** better to mention the RD here and to also say that mention of the RD
**** was inadvertently omitted from RFC 7432. Perhaps an erratum should be
**** opened on RFC 7432.
3.2 Overlay Indexes and Recursive Lookup Resolution
RT-5 routes support recursive lookup resolution through the use of
Overlay Indexes as follows:
o An Overlay Index can be an ESI, IP address (in the address
space of the tenant) or MAC address and it is used by an NVE
**** Consider removing the parentheses, as the fact that the IP address is
**** in the tenant's IP address space is crucial.
**** The end of section 2.1 suggests that an Overlay Index can only be an
**** ESI or IP address.
as the next-hop for a given IP Prefix. An Overlay Index always
needs a recursive route resolution on the NVE receiving the IP
Prefix route,
**** More precisely, recursive resolution of the Overlay Index needs to be
**** done by an NVE that installs the RT-5 route into one of its IP-VRFs.
**** But not at intermediate nodes that merely propagate the route. Note
**** that an intermediate node can also be an NVE, so it's important to note
**** that recursive resolution of the Overlay Index applies upon
**** installation into an IP-VRF, but not upon propagation. (This differs
**** from the ordinary recursive resolution of BGP next hops.)
**** Fortunately, IPVPN/EVPN interoperability is outside the scope of this
**** document, because there is now way to pass an Overlay Index to the
**** IPVPN routing. I don't know whether that's an issue or not, but it's
**** something to keep in mind when IPVPN/EVPN interworking is discussed.
that the NVE knows to which egress NVE it
needs to forward the packets. The egress NVE may not be the
**** Perhaps "may not" --> "need not", as "may not" could be interpreted to
**** be synonomous be "must not" in english.
same NVE that originated the RT-5.
o The Overlay Index is indicated along with the RT-5 in the ESI
field, GW IP field or Router's MAC Extended Community,
depending on whether the IP Prefix next-hop is an ESI, IP
address or MAC address in the tenant space. The Overlay Index
for a given IP Prefix is set by local policy (typically
managed by the Cloud Management System).
**** Set by local policy at the NVE that originates an RT-5 for that IP
**** prefix, not by local policy at the NVE installing the RT-5.
o In order to enable the recursive lookup resolution at the
ingress NVE, the egress NVE that owns the Overlay Index must
**** Perhaps: "the egress NVE that owns the Overlay Index" --> "an NVE that
**** is a possible egress NVE for a given Overlay Index"
advertise the location of the Overlay Index.
**** Perhaps: "must advertise the location of" --> "must originate a route
**** advertising itself as the BGP next hop on the path to the system
**** denoted by the Overlay Index".
For instance, if
the IP Prefix originating NVE sends an RT-5 with ESI-1 as
Overlay Index, then the ingress NVE will expect an RT-1 (Auto-
Discovery per-EVI route) with ESI-1 to be received from the
egress NVE. If the Overlay Index is encoded in the GW IP field
or the Router's MAC Extended Community, the ingress NVE will
expect an RT-2 (MAC/IP route) from the egress NVE so that the
Overlay Index can be resolved.
**** I find the above paragraph a bit hard to understand, as the phrase "the
**** ingress NVE will expect an RT ...from the egress NVE" sort of suggests
**** that the ingress NVE knows in advance who the egress NVE is. Perhaps
**** consider something like:
"For instance, if an NVE receives an RT-5 that specifies an Overlay
Index, the NVE cannot install the RT-5 in its IP-VRF unless (or
until) it can recursively resolve the Overlay Index. If the RT-5
specifies an ESI as the Overlay Index, recursive resolution can only
be done if the NVE has received and installed an RT-1 (Auto-Discovery
per-EVI) route specifying that ESI. If the RT-5 specifies a GW IP
address as the Overlay Index, recursive resolution can only be done
if the NVE has received and installed an RT-2 (MAC/IP route)
specifying that IP address in the IP address field of its NLRI. If
the RT-5 specifies a MAC address as the Overlay Index, recursive
resolution can only be done if the NVE has received and installed an
RT-2 (MAC/IP route) specifying that MAC address in the MAC address
field of its NLRI."
"Note that the RT-1 or RT-2 routes needed for the recursive resolution
may arrive before or after the given RT-5 route."
**** (It's always a good idea to mention that things can come out of order.)
**** Ordinarily, BGP will do recursive next hop resolution if there is a BGP
**** route but no IGP or static route to the BGP next hop. Here you are
**** requiring recursive resolution (upon installation of an RT-5 into an
**** IP-VRF) whenever there is an Overlay Index, independent of what sort of
**** route there is to the BGP next hop. This has to be made very clear.
**** Note also that if there is no IGP route to the BGP next hop of an RT-5,
**** BGP may fail to install the RT-5 even if the Overlay Index can be
**** resolved. This may cause some unexpected behavior if an egress NVE
**** goes down.
o If the ESI field is different than zero, the GW IP field will
be zero, and vice versa. A route containing a non-zero GW IP
and a non-zero ESI will be treated as-withdraw.
**** Isn't there a valid case where the GW IP and ESI fields are zero, but
**** the overlay index is carried in the Router's MAC EC? Above seems to
**** say that they cannot both be zero.
**** If both fields are zero and the Router's MAC EC is not present, do we
**** also want to do a "treat-as-withdraw"? I think it's a bit awkward to
**** sometimes have the Overlay Index in the NLRI and sometimes in the EC
**** attribute, but I suppose it's way too late to fix that.
**** Does an RT-5 always have to have an Overlay Index requiring recursive
**** resolution, or is there some way to specify in an RT-5 that the
**** ordinary BGP next hop field is to be used as the next hop? Is that the
**** case where both GW IP and ESI are zero but the Router's MAC EC is not
**** present?
**** Later on there is some suggestion that recursive resolution can be
**** avoided if the MPLS label field is non-zero. It would be good to have
**** one place that says, for each combination of zero/non-zero in the GW
**** IP, ESI, and Label fields, and he presence or absence of the Route's
**** MAC EC, what we can conclude about the Overlay Index.
The use of Overlay Indexes decouples the origination of the RT-5 from
the desired egress NVE for the IP Prefix.
**** I find the above sentence confusing, but I think it can just be
**** omitted.
The indirection provided by
Rabadan et al. Expires September 23, 2017 [Page 10]
Internet-Draft EVPN Prefix Advertisement March 22, 2017
the Overlay Index and its recursive lookup resolution is required to
achieve fast convergence in case of a failure of the object
represented by the Overlay Index. For instance: in Figure 1, let's
assume NVE2/NVE3 advertise 1k RT-5 routes associated to the floating
IP address (GWIP=vIP23) and NVE2 advertises an RT-2 claiming the
ownership of the floating IP, i.e. NVE2 encodes vIP23 and M2 in the
RT-2. When the floating IP owner changes from M2 to M3, a single RT-2
withdraw/update is required to indicate the change. The remote DGW
will not change any of the 1k prefixes associated to vIP23, but will
only update the ARP resolution entry for vIP23 (now pointing at M3).
The following table shows the different inter-subnet use-cases
described in this document and the corresponding coding of the
overlay index in the route type 5 (RT-5). The IP-VRF-to-IP-VRF or IRB
forwarding on NVEs case is a special use-case, where there may be no
need for Overlay Index, since the actual next-hop is given by the BGP
next-hop.
**** Then the table below should allow "none" as a possible Overlay Index
**** in the IP-VRF-to-IP-VRF case. Also, the draft should say very clearly
**** how you encode the fact that there is no Overlay Index.
When an Overlay Index is present in the RT-5, the receiving
NVE will need to perform a recursive route resolution to find the
egress NVE to forward the packets.
+----------------------------+--------------------------------------+
| Use-case | Overlay Index in the RT-5 BGP update |
+----------------------------+--------------------------------------+
| TS IP address | Overlay GW IP Address |
| Floating IP address | Overlay GW IP Address |
| "Bump in the wire" | ESI or MAC |
| IP-VRF-to-IP-VRF | Overlay GW IP, MAC or N/A |
+----------------------------+--------------------------------------+
**** This table says what kind of Overlay Index is needed in each use case,
**** but does not say how you know from a received RT-5 what kind of Overlay
**** Index is being advertised.
The above use-cases are representative of the different Overlay
Indexes supported by RT-5 (GW IP, ESI, MAC or N/A). Any other use-
case using a given Overlay Index, SHOULD follow the procedures
described in this document for the same Overlay Index.
4. IP Prefix Overlay Index use-cases
This section describes some use-cases for the Overlay Index types.
4.1 TS IP address Overlay Index use-case
The following figure illustrates an example of inter-subnet
forwarding for subnets sitting behind Virtual Appliances (on TS2 and
TS3).
Rabadan et al. Expires September 23, 2017 [Page 11]
Internet-Draft EVPN Prefix Advertisement March 22, 2017
SN1---+ NVE2 DGW1
| +-----------+ +---------+ +-------------+
SN2---TS2(VA)--|(MAC-VRF10)|-| |----|(MAC-VRF10) |
| IP2/M2 +-----------+ | | | IRB1\ |
IP4---+ | | | (IP-VRF)|---+
| | +-------------+ _|_
| VXLAN/ | ( )
| nvGRE | DGW2 ( WAN )
SN1---+ NVE3 | | +-------------+ (___)
| IP3/M3 +-----------+ | |----|(MAC-VRF10) | |
SN3---TS3(VA)--|(MAC-VRF10)|-| | | IRB2\ | |
| +-----------+ +---------+ | (IP-VRF)|---+
IP5---+ +-------------+
Figure 2 TS IP address use-case
An example of inter-subnet forwarding between subnet SN1/24 and a
subnet sitting in the WAN is described below. NVE2, NVE3, DGW1 and
DGW2 are running BGP EVPN. TS2 and TS3 do not participate in dynamic
routing protocols, and they only have a static route to forward the
traffic to the WAN.
In this case, a GW IP is used as an Overlay Index. Although a
different Overlay Index type could have been used, this use-case
assumes that the operator knows the VA's IP addresses beforehand,
whereas the VA's MAC address is unknown and the VA's ESI is zero.
Because of this, the GW IP is the suitable Overlay Index to be used
with the RT-5s. The NVEs know the GW IP to be used for a given Prefix
by policy.
(1) NVE2 advertises the following BGP routes on behalf of TS2:
o Route type 2 (MAC/IP route) containing: ML=48, M=M2, IPL=32,
IP=IP2 and [RFC5512] BGP Encapsulation Extended Community with
the corresponding Tunnel-type. The MAC and IP addresses may be
learned via ARP-snooping (ND-snooping if IPv6).
o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1,
ESI=0, GW IP address=IP2. The prefix and GW IP are learned by
policy.
(2) Similarly, NVE3 advertises the following BGP routes on behalf of
TS3:
o Route type 2 (MAC/IP route) containing: ML=48, M=M3, IPL=32,
IP=IP3 (and BGP Encapsulation Extended Community).
o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1,
Rabadan et al. Expires September 23, 2017 [Page 12]
Internet-Draft EVPN Prefix Advertisement March 22, 2017
ESI=0, GW IP address=IP3.
(3) DGW1 and DGW2 import both received routes based on the
route-targets:
o Based on the MAC-VRF10 route-target in DGW1 and DGW2, the
MAC/IP route is imported and M2 is added to the MAC-VRF10
along with its corresponding tunnel information.
**** Above is about the RT-2 from NVE2. Don't the DGWs also import the RT-2
**** from NVE3?
For instance,
if VXLAN is used, the VTEP will be derived from the MAC/IP
route BGP next-hop and VNI from the MPLS Label1 field. IP2 -
M2 is added to the ARP table.
**** As well as IP3/M3, right?
o Based on the MAC-VRF10 route-target in DGW1 and DGW2, the IP
Prefix route is also imported and SN1/24 is added to the IP-
VRF with Overlay Index IP2 pointing at the local MAC-VRF10.
Should ECMP be enabled in the IP-VRF, SN1/24 would also be
added to the routing table with Overlay Index IP3.
**** With regard to the RT-5s, isn't it true that BGP bestpath selection is
**** applied, and one of the RT-5s (either from NVE2 or NVE3) is selected as
**** the bestpath? From the example we can't tell which would be
**** preferred. I think ECMP applies only if both RT-5s are equally
**** preferable.
(4) When DGW1 receives a packet from the WAN with destination IPx,
where IPx belongs to SN1/24:
o A destination IP lookup is performed on the DGW1 IP-VRF
routing table and Overlay Index=IP2 is found.
**** Assuming that the route from NVE2 was preferred for some reason to the
**** route from NVE3??
Since IP2 is an
Overlay Index a recursive route resolution is required for
IP2.
o IP2 is resolved to M2 in the ARP table, and M2 is resolved to
the tunnel information given by the MAC-VRF FIB (e.g. remote
VTEP and VNI for the VXLAN case).
o The IP packet destined to IPx is encapsulated with:
. Source inner MAC = IRB1 MAC.
. Destination inner MAC = M2.
. Tunnel information provided by the MAC-VRF (VNI, VTEP IPs
and MACs for the VXLAN case).
(5) When the packet arrives at NVE2:
o Based on the tunnel information (VNI for the VXLAN case), the
MAC-VRF10 context is identified for a MAC lookup.
o Encapsulation is stripped-off and based on a MAC lookup
(assuming MAC forwarding on the egress NVE), the packet is
forwarded to TS2, where it will be properly routed.
Rabadan et al. Expires September 23, 2017 [Page 13]
Internet-Draft EVPN Prefix Advertisement March 22, 2017
(6) Should TS2 move from NVE2 to NVE3, MAC Mobility procedures will
be applied to the MAC route IP2/M2, as defined in [RFC7432].
Route type 5 prefixes are not subject to MAC mobility procedures,
hence no changes in the DGW IP-VRF routing table will occur for
TS2 mobility, i.e. all the prefixes will still be pointing at IP2
as Overlay Index. There is an indirection for e.g. SN1/24, which
still points at Overlay Index IP2 in the routing table, but IP2
will be simply resolved to a different tunnel, based on the
outcome of the MAC mobility procedures for the MAC/IP route
IP2/M2.
Note that in the opposite direction, TS2 will send traffic based on
its static-route next-hop information (IRB1 and/or IRB2), and regular
EVPN procedures will be applied.
4.2 Floating IP Overlay Index use-case
Sometimes Tenant Systems (TS) work in active/standby mode where an
upstream floating IP - owned by the active TS - is used as the
Overlay Index to get to some subnets behind. This redundancy mode,
already introduced in section 2.1 and 2.2, is illustrated in Figure
3.
NVE2 DGW1
+-----------+ +---------+ +-------------+
+---TS2(VA)--|(MAC-VRF10)|-| |----|(MAC-VRF10) |
| IP2/M2 +-----------+ | | | IRB1\ |
| <-+ | | | (IP-VRF)|---+
| | | | +-------------+ _|_
SN1 vIP23 (floating) | VXLAN/ | ( )
| | | nvGRE | DGW2 ( WAN )
| <-+ NVE3 | | +-------------+ (___)
| IP3/M3 +-----------+ | |----|(MAC-VRF10) | |
+---TS3(VA)--|(MAC-VRF10)|-| | | IRB2\ | |
+-----------+ +---------+ | (IP-VRF)|---+
+-------------+
Figure 3 Floating IP Overlay Index for redundant TS
In this use-case, a GW IP is used as an Overlay Index for the same
reasons as in 4.1. However, this GW IP is a floating IP that belongs
to the active TS. Assuming TS2 is the active TS and owns IP23:
(1) NVE2 advertises the following BGP routes for TS2:
o Route type 2 (MAC/IP route) containing: ML=48, M=M2, IPL=32,
IP=IP23 (and BGP Encapsulation Extended Community). The MAC
and IP addresses may be learned via ARP-snooping.
Rabadan et al. Expires September 23, 2017 [Page 14]
Internet-Draft EVPN Prefix Advertisement March 22, 2017
o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1,
ESI=0, GW IP address=IP23. The prefix and GW IP are learned by
policy.
(2) NVE3 advertises the following BGP routes for TS3:
o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1,
ESI=0, GW IP address=IP23. The prefix and GW IP are learned by
policy.
**** It might be worth mentioning that NVE3 does not advertise an RT-2 for
**** IP23/M3.
(3) DGW1 and DGW2 import both received routes based on the route-
target:
o M2 is added to the MAC-VRF10 FIB along with its corresponding
tunnel information. For the VXLAN use case, the VTEP will be
derived from the MAC/IP route BGP next-hop and VNI from the
VNI/VSID field. IP23 - M2 is added to the ARP table.
o SN1/24 is added to the IP-VRF in DGW1 and DGW2 with Overlay
index IP23 pointing at the local MAC-VRF10.
**** Should it be "pointing at M2 in the local MAC-VRF10"?
(4) When DGW1 receives a packet from the WAN with destination IPx,
where IPx belongs to SN1/24:
o A destination IP lookup is performed on the DGW1 IP-VRF
routing table and Overlay Index=IP23 is found. Since IP23 is
an Overlay Index, a recursive route resolution for IP23 is
required.
o IP23 is resolved to M2 in the ARP table, and M2 is resolved to
the tunnel information given by the MAC-VRF (remote VTEP and
VNI for the VXLAN case).
o The IP packet destined to IPx is encapsulated with:
. Source inner MAC = IRB1 MAC.
. Destination inner MAC = M2.
. Tunnel information provided by the MAC-VRF FIB (VNI, VTEP
IPs and MACs for the VXLAN case).
(5) When the packet arrives at NVE2:
o Based on the tunnel information (VNI for the VXLAN case), the
MAC-VRF10 context is identified for a MAC lookup.
o Encapsulation is stripped-off and based on a MAC lookup
Rabadan et al. Expires September 23, 2017 [Page 15]
Internet-Draft EVPN Prefix Advertisement March 22, 2017
(assuming MAC forwarding on the egress NVE), the packet is
forwarded to TS2, where it will be properly routed.
(6) When the redundancy protocol running between TS2 and TS3 appoints
TS3 as the new active TS for SN1, TS3 will now own the floating
IP23 and will signal this new ownership (GARP message or
similar). Upon receiving the new owner's notification, NVE3 will
issue a route type 2 for M3-IP23 (and NVE2 will withdraw the RT-2
for M2-IP23).
**** I'd remove the parentheses in the last sentence, as the withdrawal by
**** NVE2 is pretty important (unless TS2 actually goes down). BTW, if
**** NVE2's withdrawal is received before NVE3's update, you can get a short
**** black hole (i.e., there doesn't seem to be any "make before break";
**** don't know if that's an issue or not).
DGW1 and DGW2 will update their ARP tables with the
new MAC resolving the floating IP. No changes are made in the IP-
VRF routing table.
4.3 Bump-in-the-wire use-case
Figure 5 illustrates an example of inter-subnet forwarding for an IP
Prefix route that carries a subnet SN1. In this use-case, TS2 and TS3
are layer-2 VA devices without any IP address that can be included as
an Overlay Index in the GW IP field of the IP Prefix route. Their MAC
addresses are M2 and M3 respectively and are connected to EVI-10.
Note that IRB1 and IRB2 (in DGW1 and DGW2 respectively) have IP
addresses in a subnet different than SN1.
NVE2 DGW1
M2 +-----------+ +---------+ +-------------+
+---TS2(VA)--|(MAC-VRF10)|-| |----|(MAC-VRF10) |
| ESI23 +-----------+ | | | IRB1\ |
| + | | | (IP-VRF)|---+
| | | | +-------------+ _|_
SN1 | | VXLAN/ | ( )
| | | nvGRE | DGW2 ( WAN )
| + NVE3 | | +-------------+ (___)
| ESI23 +-----------+ | |----|(MAC-VRF10) | |
+---TS3(VA)--|(MAC-VRF10)|-| | | IRB2\ | |
M3 +-----------+ +---------+ | (IP-VRF)|---+
+-------------+
Figure 5 Bump-in-the-wire use-case
Since neither TS2 nor TS3 can participate in any dynamic routing
protocol and have no IP address assigned, there are two potential
Overlay Index types that can be used when advertising SN1:
a) an ESI, i.e. ESI23, that can be provisioned on the attachment
ports of NVE2 and NVE3, as shown in Figure 5.
b) or the VA's MAC address, that can be added to NVE2 and NVE3 by
policy.
Rabadan et al. Expires September 23, 2017 [Page 16]
Internet-Draft EVPN Prefix Advertisement March 22, 2017
The advantage of using an ESI as Overlay Index as opposed to the VA's
MAC address, is that the forwarding to the egress NVE can be done
purely based on the state of the AC in the ES (notified by the AD
per-EVI route) and all the EVPN multi-homing redundancy mechanisms
can be re-used. For instance, the [RFC7432] mass-withdrawal mechanism
for fast failure detection and propagation can be used. This section
assumes that an ESI Overlay Index is used in this use-case but it
does not prevent the use of the VA's MAC address as an Overlay Index.
If a MAC is used as Overlay Index, the control plane must follow the
procedures described in section 4.4.3.
The model supports VA redundancy in a similar way as the one
described in section 4.2 for the floating IP Overlay Index use-case,
only using the EVPN Ethernet A-D per-EVI route instead of the MAC
**** Perhaps "only using" --> "except that it uses"
advertisement route to advertise the location of the Overlay Index.
The procedure is explained below:
(1) Assuming TS2 is the active TS in ESI23, NVE2 advertises the
following BGP routes:
o Route type 1 (Ethernet A-D route for EVI-10) containing:
ESI=ESI23 and the corresponding tunnel information (VNI/VSID
field), as well as the BGP Encapsulation Extended Community as
per [EVPN-OVERLAY].
o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1,
ESI=ESI23, GW IP address=0. The Router's MAC Extended
Community defined in [EVPN-INTERSUBNET] is added and carries
the MAC address (M2) associated to the TS behind which SN1
sits. M2 may be learned by policy.
**** I don't think it's been clear up to now that when the ESI field is
**** non-zero and the GW IP field is zero, the Router's MAC EC must be
**** included.
(2) NVE3 advertises the following BGP routes for TS3:
o Route type 5 (IP Prefix route) containing: IPL=24, IP=SN1,
ESI=23, GW IP address=0. The Router's MAC Extended Community
is added and carries the MAC address (M3) associated to the TS
behind which SN1 sits. M3 may be learned by policy.
(3) DGW1 and DGW2 import the received routes based on the route-
target:
o The tunnel information to get to ESI23 is installed in DGW1
and DGW2. For the VXLAN use case, the VTEP will be derived
from the Ethernet A-D route BGP next-hop and VNI from the
VNI/VSID field (see [EVPN-OVERLAY]).
o SN1/24 is added to the IP-VRF in DGW1 and DGW2 with Overlay
Index ESI23.
Rabadan et al. Expires September 23, 2017 [Page 17]
Internet-Draft EVPN Prefix Advertisement March 22, 2017
(4) When DGW1 receives a packet from the WAN with destination IPx,
where IPx belongs to SN1/24:
o A destination IP lookup is performed on the DGW1 IP-VRF
routing table and Overlay Index=ESI23 is found. Since ESI23 is
an Overlay Index, a recursive route resolution is required to
find the egress NVE where ESI23 resides.
o The IP packet destined to IPx is encapsulated with:
. Source inner MAC = IRB1 MAC.
. Destination inner MAC = M2 (this MAC will be obtained
from the Router's MAC Extended Community received along
with the RT-5 for SN1).
. Tunnel information for the NVO tunnel is provided by the
Ethernet A-D route per-EVI for ESI23 (VNI and VTEP IP for
the VXLAN case).
**** By this procedure, a given DGW chooses between NVE2 and NVE3 based only
**** on the RT-5s. Don't NVE2 and NVE3 each originate an "Ethernet-AD
**** per-EVI route" for ESI23? If so, what stops DGW1 from preferring
**** NVE2's RT-5 while also prferring NVE3's "Ethernet-AD per-EVI route"?
**** Wouldn't the result be that DGW1 tunnels a packet to NVE3 but uses the
**** MAC address M2? If so, will that still work properly? Or have I
**** misunderstood something?
(5) When the packet arrives at NVE2:
o Based on the tunnel demultiplexer information (VNI for the
VXLAN case), the MAC-VRF10 context is identified for a MAC
lookup (assuming MAC disposition model) or the VNI MAY
directly identify the egress interface (for a label or VNI
disposition model).
o Encapsulation is stripped-off and based on a MAC lookup
(assuming MAC forwarding on the egress NVE) or a VNI lookup
(in case of VNI forwarding), the packet is forwarded to TS2,
where it will be forwarded to SN1.
(6) If the redundancy protocol running between TS2 and TS3 follows an
active/standby model and there is a failure, appointing TS3 as
the new active TS for SN1, TS3 will now own the connectivity to
SN1 and will signal this new ownership. Upon receiving the new
owner's notification, NVE3's AC will become active and issue a
route type 1 for ESI23, whereas NVE2 will withdraw its Ethernet
A-D route for ESI23. DGW1 and DGW2 will update their tunnel
information to resolve ESI23. The destination inner MAC will be
changed to M3.
4.4 IP-VRF-to-IP-VRF model
This use-case is similar to the scenario described in "IRB forwarding
on NVEs for Tenant Systems" in [EVPN-INTERSUBNET], however the new
requirement here is the advertisement of IP Prefixes as opposed to
Rabadan et al. Expires September 23, 2017 [Page 18]
Internet-Draft EVPN Prefix Advertisement March 22, 2017
only host routes.
In the examples described in sections 4.1, 4.2 and 4.3, the MAC-VRF
instance can connect IRB interfaces and any other Tenant Systems
connected to it. EVPN provides connectivity for:
1. Traffic destined to the IRB or TS IP interfaces as well as
2. Traffic destined to IP subnets sitting behind the TS, e.g. SN1 or
SN2.
In order to provide connectivity for (1), MAC/IP routes (RT-2) are
needed so that IRB or TS MACs and IPs can be distributed.
Connectivity type (2) is accomplished by the exchange of IP Prefix
routes (RT-5) for IPs and subnets sitting behind certain Overlay
Indexes, e.g. GW IP or ESI.
In some cases, IP Prefix routes may be advertised for subnets and IPs
sitting behind an IRB, and EVPN is the only enabled SAFI in the
network.
**** In this revision, you've stated already that the IP-VRFs are populated
**** only by EVPN routes, so I don't think the "EVPN is the only enabled
**** SAFI" is needed.
We refer to this use-case as the "IP-VRF-to-IP-VRF" model.
[EVPN-INTERSUBNET] defines an asymmetric IRB model and a symmetric
IRB model, based on the required lookups at the ingress and egress
NVE: the asymmetric model requires an ip-lookup and a mac-lookup at
the ingress NVE, whereas only a mac-lookup is needed at the egress
NVE; the symmetric model requires ip and mac lookups at both, ingress
and egress NVE. From that perspective, the IP-VRF-to-IP-VRF use-case
described in this section is a symmetric IRB model.
Note that, in an IP-VRF-to-IP-VRF scenario, out of the many subnets
that a tenant may have, only a few are attached to a given NVE/PE's
**** Perhaps "only a few are attached" --> "it may be the case that only a
**** few are attached"
IP-VRF. In order to provide inter-subnet connectivity across multiple
NVE/PEs, a shared core EVI may be configured in all the tenant
NVE/PEs. This core EVI has a core-facing IRB interface that connects
the core MAC-VRF to the IP-VRF on each NVE/PE.
**** I get (I think) that the "core EVI" has no ACs, and is only used to
**** carry IP trafic across the core. I'm not sure what "core-facing IRB
**** interface" means or what "core MAC-VRF" means. Also, the last sentence
**** above seems to exclude the interface-less model.
Based on the
characteristics of this core-facing IRB interface, there are three
different IP-VRF-to-IP-VRF scenarios identified and described in this
document:
1) Interface-less model
**** So "based on the characteristics of the core-facing IRB inteface", we
**** might have an "interface-less" model. Is that because one of the
**** characteristics might be "non-existence"? ;-)
2) Interface-full with core-facing IRB model
**** If you look at terms like "careful/careless", "stateful/stateless",
**** "thoughtful/thoughtless", I think you'll realize that "interface-full"
**** should be "interface-ful". But I'll leave that for you to discuss with
**** the RFC Editor ;-)
3) Interface-full with unnumbered core-facing IRB model
**** Okay, I think I see what's going on here now. To support inter-subnet
**** forwarding among a set of NVEs/PEs, you propose to create a new
**** "inter-subnet" EVI on all those NVE/PEs, and then to use the tunnels of
**** that EVI to carry the inter-subnet traffic. If the tunnels are NVO
**** ethernet tunnels, this is analogous to the "Supplementary Broadcast
**** Domain" from draft-lin. If the tunnels are NVO IP tunnels, then this
**** new EVI is more of a "Supplementary IP Subnet" that exists on all the
**** NVEs/PEs. This part seems to be common to all three models.
**** It's hard to say how this model is going to interact with the
**** developing procedures for support of EVPN multicast. Please make it
**** clear that support for inter-subnet IP multicast is outside the scope
**** of this document.
4.4.1 Interface-less IP-VRF-to-IP-VRF model
Figure 6 will be used for the description of this model.
Rabadan et al. Expires September 23, 2017 [Page 19]
Internet-Draft EVPN Prefix Advertisement March 22, 2017
NVE1(M1)
+------------+
IP1+----|(MAC-VRF1) | DGW1(M3)
| \ | +---------+ +--------+
| (IP-VRF)|----| |-|(IP-VRF)|----+
| / | | | +--------+ |
+---|(MAC-VRF2) | | | _+_
| +------------+ | | ( )
SN1| | VXLAN/ | ( WAN )
| NVE2(M2) | nvGRE/ | (___)
| +------------+ | MPLS | +
+---|(MAC-VRF2) | | | DGW2(M4) |
| \ | | | +--------+ |
| (IP-VRF)|----| |-|(IP-VRF)|----+
| / | +---------+ +--------+
SN2+----|(MAC-VRF3) |
+------------+
Figure 6 Interface-less IP-VRF-to-IP-VRF model
In this case:
a) The NVEs and DGWs must provide connectivity between hosts in SN1,
SN2, IP1 and hosts sitting at the other end of the WAN.
**** It would be good to show some hosts "sitting at the other end of the
**** WAN". The existence of such hosts suggests that the DGWs are importing
**** IP and/or VPN-IP routes into their IP-VRFs. However, this document has
**** declared such a situation to be out of scope, and the beginning of
**** section 4.4 says that EVPN is the only enabled SAFI. So I'm a bit
**** confused about this part of the scenario.
b) The IP-VRF instances in the NVE/DGWs are directly connected
through NVO tunnels, and no IRBs and/or MAC-VRF instances are
instantiated to connect the IP-VRFs.
c) The solution must provide layer-3 connectivity among the IP-VRFs
for Ethernet NVO tunnels, for instance, VXLAN or nvGRE.
**** If ethernet NVO tunnels are used, then when the DGW receives a frame
**** from one of those tunnels it removes the ethernet header, does an IP
**** lookup, and adjusts TTL and checksum. That certainly sounds like what
**** the DGW would do when receiving a packet over an IRB interface. The
**** salient feature is really that the DGW does not need to do a MAC
**** address lookup, and so there is no need to populate a MAC-VRF with
**** RT-2s.
d) The solution may provide layer-3 connectivity among the IP-VRFs
for IP NVO tunnels, for example, VXLAN GPE (with IP payload).
In order to meet the above requirements, the EVPN route type 5 will
be used to advertise the IP Prefixes, along with the Router's MAC
Extended Community as defined in [EVPN-INTERSUBNET] if the
advertising NVE/DGW uses Ethernet NVO tunnels. Each NVE/DGW will
advertise an RT-5 for each of its prefixes with the following fields:
o RD as per [RFC7432].
o Eth-Tag ID=0.
Rabadan et al. Expires September 23, 2017 [Page 20]
Internet-Draft EVPN Prefix Advertisement March 22, 2017
o IP address length and IP address, as explained in the previous
sections.
o GW IP address=0.
o ESI=0
o MPLS label or VNI corresponding to the IP-VRF.
Each RT-5 will be sent with a route-target identifying the tenant
(IP-VRF) and two BGP extended communities:
o The first one is the BGP Encapsulation Extended Community, as
per [RFC5512], identifying the tunnel type.
o The second one is the Router's MAC Extended Community as per
[EVPN-INTERSUBNET] containing the MAC address associated to
the NVE advertising the route. This MAC address identifies the
NVE/DGW and MAY be re-used for all the IP-VRFs in the NVE. The
Router's MAC Extended Community MUST be sent if the route is
associated to an Ethernet NVO tunnel, for instance, VXLAN. If
the route is associated to an IP NVO tunnel, for instance
VXLAN GPE with IP payload, the Router's MAC Extended Community
SHOULD NOT be sent.
The following example illustrates the procedure to advertise and
forward packets to SN1/24 (ipv4 prefix advertised from NVE1):
(1) NVE1 advertises the following BGP route:
o Route type 5 (IP Prefix route) containing:
. IPL=24, IP=SN1, Label=10.
. GW IP= SHOULD be set to 0.
. [RFC5512] BGP Encapsulation Extended Community.
**** In this example, that EC would identify "VXLAN"?
. Router's MAC Extended Community that contains M1.
. Route-target identifying the tenant (IP-VRF).
(2) DGW1 imports the received routes from NVE1:
o DGW1 installs SN1/24 in the IP-VRF identified by the RT-5
route-target.
o Since GW IP=0 and the Label is a valid value, DGW1 will use
**** Please define "valid value". I think you just mean "non-zero value".
Rabadan et al. Expires September 23, 2017 [Page 21]
Internet-Draft EVPN Prefix Advertisement March 22, 2017
the Label and next-hop of the RT-5, as well as the MAC address
conveyed in the Router's MAC Extended Community (as inner
destination MAC address) to set up the forwarding state and
later encapsulate the routed IP packets.
**** This means that if a BGP speaker propagates the RT-5 route, and if that
**** speaker changes the BGP next hop, it also needs to change the label
**** value. Or in other words, this probably won't work as stated in an
**** Option B interconnect scenario. Unless I'm missing something, you need
**** to state clearly that this procedure doesn't work as described if the
**** BGP next hop changes.
(3) When DGW1 receives a packet from the WAN with destination IPx,
**** I notice you don't mention the case where DGW1 receives a packet from
**** NVE1 or NVE2 that is addressed to a host that is somewhere out on the
**** WAN. But you do mention that that is part of the scenario. Should
**** this case be covered?
where IPx belongs to SN1/24:
o A destination IP lookup is performed on the DGW1 IP-VRF
routing table. The lookup yields SN1/24.
o Since the RT-5 for SN1/24 had a GW IP=0 and a valid Label and
next-hop, DGW1 will not need a recursive lookup to resolve the
route.
o The IP packet destined to IPx is encapsulated with: Source
inner MAC = DGW1 MAC, Destination inner MAC = M1, Source outer
IP (source VTEP) = DGW1 IP, Destination outer IP (destination
VTEP) = NVE1 IP. The Source and Destination inner MAC
addresses are not needed if IP NVO tunnels are used.
(4) When the packet arrives at NVE1:
o NVE1 will identify the IP-VRF for an IP-lookup based on the
Label (the Destination inner MAC is not needed to identify the
IP-VRF).
**** Why does NVE1 have to signal the destination inner MAC to DGW1 if NVE1
**** is not going to look at the destination inner MAC field when it
**** receives frames from NVE1?
o An IP lookup is performed in the routing context, where SN1
turns out to be a local subnet associated to MAC-VRF2. A
subsequent lookup in the ARP table and the MAC-VRF FIB will
provide the forwarding information for the packet in MAC-VRF2.
The model described above is called Interface-less model since the
IP-VRFs are connected directly through tunnels and they don't require
those tunnels to be terminated in core MAC-VRFs instead, like in
sections 4.4.2 or 4.4.3. An EVPN IP-VRF-to-IP-VRF implementation is
REQUIRED to support the ingress and egress procedures described in
this section.
**** I know you don't want to change the names of the schemes, but I think
**** this is really the "No Overlay Index" (or "indexless" ;-)) model. The
**** crucial feature of this use case, I think, is that since there is no
**** Overlay Index, there is no need for EVPN recursive resolution, hence no
**** need for a MAC-VRF on the DGWs.
**** Of course, one could imagine a DGW that also functioned as an NVE (with
**** ACs to a particular BD), but did inter-subnet unicast routing without
**** use of an Overlay Index. Would that be considered to be the
**** interface-less model, even though there has to be a MAC-VRF for the
**** attached BD?
4.4.2 Interface-full IP-VRF-to-IP-VRF with core-facing IRB
Figure 7 will be used for the description of this model.
Rabadan et al. Expires September 23, 2017 [Page 22]
Internet-Draft EVPN Prefix Advertisement March 22, 2017
NVE1
+------------+ DGW1
IP1+----+(MAC-VRF1) | +---------------+ +------------+
| \ (core) (core) |
|(IP-VRF)(MAC-VRF) (MAC-VRF)(IP-VRF)|-----+
| / IRB(IP1/M1) IRB(IP3/M3) | |
+---+(MAC-VRF2) | | | +------------+ _+_
| +------------+ | | ( )
SN1| | VXLAN/ | ( WAN )
| NVE2 | nvGRE/ | (___)
| +------------+ | MPLS | DGW2 +
+---+(MAC-VRF2) | | | +------------+ |
| \ (core) (core) | |
|(IP-VRF)(MAC-VRF) (MAC-VRF)(IP-VRF)|-----+
| / IRB(IP2/M2) IRB(IP4/M4) |
SN2+----+(MAC-VRF3) | +---------------+ +------------+
+------------+
Figure 7 Interface-full with core-facing IRB model
In this model:
a) As in section 4.4.1, the NVEs and DGWs must provide connectivity
between hosts in SN1, SN2, IP1 and hosts sitting at the other end
of the WAN.
b) However, the NVE/DGWs are now connected through Ethernet NVO
tunnels terminated in core-MAC-VRF instances. The IP-VRFs use IRB
interfaces for their connectivity to the core MAC-VRFs.
c) Each core-facing IRB has an IP and a MAC address, where the IP
address must be reachable from other NVEs or DGWs.
d) The core EVI is composed of the NVE/DGW MAC-VRFs and may contain
other MAC-VRFs without IRB interfaces. Those non-IRB MAC-VRFs will
typically connect TSes that need layer-3 connectivity to remote
subnets.
e) The solution must provide layer-3 connectivity for Ethernet NVO
tunnels, for instance, VXLAN or nvGRE.
EVPN type 5 routes will be used to advertise the IP Prefixes, whereas
EVPN RT-2 routes will advertise the MAC/IP addresses of each core-
facing IRB interface. Each NVE/DGW will advertise an RT-5 for each of
its prefixes with the following fields:
o RD as per [RFC7432].
Rabadan et al. Expires September 23, 2017 [Page 23]
Internet-Draft EVPN Prefix Advertisement March 22, 2017
o Eth-Tag ID=0.
o IP address length and IP address, as explained in the previous
sections.
o GW IP address=IRB-IP (this is the Overlay Index that will be
used for the recursive route resolution).
o ESI=0
o Label value SHOULD be zero since the RT-5 route requires a
recursive lookup resolution to an RT-2 route. The MPLS label
or VNI to be used when forwarding packets will be derived from
the RT-2's MPLS Label1 field. The RT-5's Label field will be
ignored on reception.
**** If the RT-5's label field is not zero, how do you know that you're
**** supposed to ignore it? Because the GW IP field is non-zero? This goes
**** back to a previous comment, that it would be great to have a table that
**** tells, you for each combination of GW IP, ESI, Label, and
**** presence/absence of Router's MAC EC, whether you have an Overlay Index
**** and if so, which sort you have.
Each RT-5 will be sent with a route-target identifying the tenant
(IP-VRF). The Router's MAC Extended Community SHOULD NOT be sent in
this case.
The following example illustrates the procedure to advertise and
forward packets to SN1/24 (ipv4 prefix advertised from NVE1):
(1) NVE1 advertises the following BGP routes:
o Route type 5 (IP Prefix route) containing:
. IPL=24, IP=SN1, Label= SHOULD be set to 0.
. GW IP=IP1 (core-facing IRB's IP)
**** Figure 7 also shows IP1 as an external system. Perhaps a cut-and-paste
**** error from Figure 6?
. Route-target identifying the tenant (IP-VRF).
o Route type 2 (MAC/IP route for the core-facing IRB)
containing:
. ML=48, M=M1, IPL=32, IP=IP1, Label=10.
. A [RFC5512] BGP Encapsulation Extended Community.
. Route-target identifying the core MAC-VRF. This route-target
MAY be the same as the one used with the RT-5.
(2) DGW1 imports the received routes from NVE1:
o DGW1 installs SN1/24 in the IP-VRF identified by the RT-5
route-target.
Rabadan et al. Expires September 23, 2017 [Page 24]
Internet-Draft EVPN Prefix Advertisement March 22, 2017
. Since GW IP is different from zero, the GW IP (IP1) will be
used as the Overlay Index for the recursive route resolution
to the RT-2 carrying IP1.
(3) When DGW1 receives a packet from the WAN with destination IPx,
where IPx belongs to SN1/24:
o A destination IP lookup is performed on the DGW1 IP-VRF
routing table. The lookup yields SN1/24, which is associated
to the Overlay Index IP1. The forwarding information is
derived from the RT-2 received for IP1.
o The IP packet destined to IPx is encapsulated with: Source
inner MAC = M3, Destination inner MAC = M1, Source outer IP
(source VTEP) = DGW1 IP, Destination outer IP (destination
VTEP) = NVE1 IP.
(4) When the packet arrives at NVE1:
o NVE1 will identify the IP-VRF for an IP-lookup based on the
Label and the inner MAC DA.
o An IP lookup is performed in the routing context, where SN1
turns out to be a local subnet associated to MAC-VRF2. A
subsequent lookup in the ARP table and the MAC-VRF FIB will
provide the forwarding information for the packet in MAC-VRF2.
The model described above is called Interface-full with core-facing
IRB model since the tunnels connecting the DGWs and NVEs need to be
terminated into the core MAC-VRFs. Those MAC-VRFs are connected to
the IP-VRFs via core-facing IRB interfaces. An EVPN IP-VRF-to-IP-VRF
implementation is REQUIRED to support the ingress and egress
procedures described in this section.
**** It seems to me that the crucial feature of this example is that in
**** order for DGW1 to reach SN1, it uses IP1 as an overlay index, hence
**** needs to have an RT2 for recursive resolution, hence needs a MAC-VRF.
4.4.3 Interface-full IP-VRF-to-IP-VRF with unnumbered core-facing IRB
Figure 8 will be used for the description of this model. Note that
this model is similar to the one described in section 4.4.2, only
without IP addresses on the core-facing IRB interfaces.
Rabadan et al. Expires September 23, 2017 [Page 25]
Internet-Draft EVPN Prefix Advertisement March 22, 2017
NVE1
+------------+ DGW1
IP1+----+(MAC-VRF1) | +---------------+ +------------+
| \ (core) (core) |
|(IP-VRF)(MAC-VRF) (MAC-VRF)(IP-VRF)|-----+
| / IRB(M1)| | IRB(M3) | |
+---+(MAC-VRF2) | | | +------------+ _+_
| +------------+ | | ( )
SN1| | VXLAN/ | ( WAN )
| NVE2 | nvGRE/ | (___)
| +------------+ | MPLS | DGW2 +
+---+(MAC-VRF2) | | | +------------+ |
| \ (core) (core) | |
|(IP-VRF)(MAC-VRF) (MAC-VRF)(IP-VRF)|-----+
| / IRB(M2)| | IRB(M4) |
SN2+----+(MAC-VRF3) | +---------------+ +------------+
+------------+
Figure 8 Interface-full with unnumbered core-facing IRB model
In this model:
a) As in section 4.4.1 and 4.4.2, the NVEs and DGWs must provide
connectivity between hosts in SN1, SN2, IP1 and hosts sitting at
the other end of the WAN.
b) As in section 4.4.2, the NVE/DGWs are connected through Ethernet
NVO tunnels terminated in core-MAC-VRF instances. The IP-VRFs use
IRB interfaces for their connectivity to the core MAC-VRFs.
c) However, each core-facing IRB has a MAC address only, and no IP
address (that is why the model refers to an 'unnumbered' core-
facing IRB). In this model, there is no need to have IP
reachability to the core-facing IRB interfaces themselves and
there is a requirement to save IP addresses on those interfaces.
d) As in section 4.4.2, the core EVI is composed of the NVE/DGW MAC-
VRFs and may contain other MAC-VRFs.
e) As in section 4.4.2, the solution must provide layer-3
connectivity for Ethernet NVO tunnels, for instance, VXLAN or
nvGRE.
This model will also make use of the RT-5 recursive resolution. EVPN
type 5 routes will advertise the IP Prefixes along with the Router's
MAC Extended Community used for the recursive lookup, whereas EVPN
RT-2 routes will advertise the MAC addresses of each core-facing IRB
Rabadan et al. Expires September 23, 2017 [Page 26]
Internet-Draft EVPN Prefix Advertisement March 22, 2017
interface (this time without an IP).
Each NVE/DGW will advertise an RT-5 for each of its prefixes with the
same fields as described in 4.4.2 except for:
o GW IP address= SHOULD be set to 0.
Each RT-5 will be sent with a route-target identifying the tenant
(IP-VRF) and the Router's MAC Extended Community containing the MAC
address associated to core-facing IRB interface. This MAC address MAY
be re-used for all the IP-VRFs in the NVE.
The example is similar to the one in section 4.4.2:
(1) NVE1 advertises the following BGP routes:
o Route type 5 (IP Prefix route) containing the same values as
in the example in section 4.4.2, except for:
. GW IP= SHOULD be set to 0.
. Router's MAC Extended Community containing M1 (this will be
used for the recursive lookup to a RT-2).
o Route type 2 (MAC route for the core-facing IRB) with the same
values as in section 4.4.2 except for:
. ML=48, M=M1, IPL=0, Label=10.
(2) DGW1 imports the received routes from NVE1:
o DGW1 installs SN1/24 in the IP-VRF identified by the RT-5
route-target.
. The MAC contained in the Router's MAC Extended Community
sent along with the RT-5 (M1) will be used as the Overlay
Index for the recursive route resolution to the RT-2
carrying M1.
(3) When DGW1 receives a packet from the WAN with destination IPx,
where IPx belongs to SN1/24:
o A destination IP lookup is performed on the DGW1 IP-VRF
routing table. The lookup yields SN1/24, which is associated
to the Overlay Index M1. The forwarding information is derived
from the RT-2 received for M1.
o The IP packet destined to IPx is encapsulated with: Source
Rabadan et al. Expires September 23, 2017 [Page 27]
Internet-Draft EVPN Prefix Advertisement March 22, 2017
inner MAC = M3, Destination inner MAC = M1, Source outer IP
(source VTEP) = DGW1 IP, Destination outer IP (destination
VTEP) = NVE1 IP.
(4) When the packet arrives at NVE1:
o NVE1 will identify the IP-VRF for an IP-lookup based on the
Label and the inner MAC DA.
o An IP lookup is performed in the routing context, where SN1
turns out to be a local subnet associated to MAC-VRF2. A
subsequent lookup in the ARP table and the MAC-VRF FIB will
provide the forwarding information for the packet in MAC-VRF2.
The model described above is called Interface-full with core-facing
IRB model (as in section 4.4.2), only this time the core-facing IRB
does not have an IP address. This model is OPTIONAL for an EVPN IP-
VRF-to-IP-VRF implementation.
**** So this is really the same as the previous case, except: (a) instead of
**** the RT-5 having an IP address as overlay index, it has a Router's MAC
**** EC as overlay index, (b) the RT2 advertising the NVE1's MAC address
**** doesn't advertise a corresponding IP address.
**** So the three cases in section 4.4 are really (a) no Overlay Index, (b)
**** Overlay Index is IP address, and (c) Overlay Index is MAC address. Of
**** course, one might say that this is three solutions to one problem, and
**** ask why we need three solutions. But since this document's been around
**** for awhile, I won't ask that question.
5. Conclusions
An EVPN route (type 5) for the advertisement of IP Prefixes is
described in this document. This new route type has a differentiated
role from the RT-2 route and addresses the Data Center (or NVO-based
networks in general) inter-subnet connectivity scenarios described in
this document. Using this new RT-5, an IP Prefix may be advertised
along with an Overlay Index that can be a GW IP address, a MAC or an
ESI, or without an Overlay Index, in which case the BGP next-hop will
point at the egress NVE/ASBR/ABR and the MAC in the Router's MAC
Extended Community will provide the inner MAC destination address to
be used. As discussed throughout the document, the EVPN RT-2 does not
meet the requirements for all the DC use cases, therefore this EVPN
route type 5 is required.
The EVPN route type 5 decouples the IP Prefix advertisements from the
MAC/IP route advertisements in EVPN, hence:
a) Allows the clean and clear advertisements of ipv4 or ipv6 prefixes
in an NLRI with no MAC addresses.
b) Since the route type is different from the MAC/IP Advertisement
route, the current [RFC7432] procedures do not need to be
modified.
c) Allows a flexible implementation where the prefix can be linked to
different types of Overlay Indexes: overlay IP address, overlay
MAC addresses, overlay ESI, underlay BGP next-hops, etc.
Rabadan et al. Expires September 23, 2017 [Page 28]
Internet-Draft EVPN Prefix Advertisement March 22, 2017
d) An EVPN implementation not requiring IP Prefixes can simply
discard them by looking at the route type value.
6. Conventions used in this document
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC-2119 [RFC2119].
7. Security Considerations
The security considerations discussed in [RFC7432] apply to this
document.
8. IANA Considerations
This document requests the allocation of value 5 in the "EVPN Route
Types" registry defined by [RFC7432]:
Value Description Reference
5 IP Prefix route [this document]
9. References
9.1 Normative References
[RFC4364]Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private
Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February 2006,
<http://www.rfc-editor.org/info/rfc4364>.
[RFC7432]Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A.,
Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based Ethernet
VPN", RFC 7432, DOI 10.17487/RFC7432, February 2015, <http://www.rfc-
editor.org/info/rfc7432>.
[RFC7606]Chen, E., Scudder, J., Mohapatra, P., and K. Patel, "Revised
Error Handling for BGP UPDATE Messages", RFC 7606, August 2015,
<http://www.rfc-editor.org/info/rfc7606>.
[EVPN-INTERSUBNET] Sajassi et al., "IP Inter-Subnet Forwarding in
EVPN", draft-ietf-bess-evpn-inter-subnet-forwarding-03.txt, work in
progress, February, 2017
[EVPN-OVERLAY] Sajassi-Drake et al., "A Network Virtualization
Overlay Solution using EVPN", draft-ietf-bess-evpn-overlay-07.txt,
**** Think carefully about whether the references to the above two drafts
**** are really normative. I'm not sure about [EVPN-INTERSUBNET], but I
**** don't think the refernece to [EVPN-OVERLAY] is really normative.
**** Normative references to internet-drafts can result in an RFC-to-be
**** remaining on the publication queue for years and years. (Don't ask me
**** how I know ;-))
Rabadan et al. Expires September 23, 2017 [Page 29]
Internet-Draft EVPN Prefix Advertisement March 22, 2017
work in progress, November, 2016
9.2 Informative References
10. Acknowledgments
The authors would like to thank Mukul Katiyar, Eric Rosen and Jeffrey
Zhang for their valuable feedback and contributions. The following
people also helped improving this document with their feedback: Tony
Przygienda and Thomas Morin.
11. Contributors
In addition to the authors listed on the front page, the following
co-authors have also contributed to this document:
Senthil Sathappan
Florin Balus
Aldrin Isaac
Senad Palislamovic
12. Authors' Addresses
Jorge Rabadan (Editor)
Nokia
777 E. Middlefield Road
Mountain View, CA 94043 USA
Email: [email protected]
Wim Henderickx
Nokia
Email: [email protected]
John E. Drake
Juniper
Email: [email protected]
Ali Sajassi
Cisco
Email: [email protected]
Wen Lin
Juniper
Email: [email protected]
Rabadan et al. Expires September 23, 2017 [Page 30]
Internet-Draft EVPN Prefix Advertisement March 22, 2017
Rabadan et al. Expires September 23, 2017 [Page 31]
_______________________________________________
BESS mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/bess