Hi Ali,
Thanks for the quick respin, which covers many of the points.
(inlined below, skipping the resolved points)
2016-09-02, Ali Sajassi (sajassi):
sites albeit for different EVIs.
+---------+ +---------+
| PE1 | | PE2 |
+---+ | +---+ | +------+ | +---+ | +---+
|CE1+---ES1----+--+ | | | MPLS | | | +--+----ES2-----+CE2|
+---+ (Root) | |MAC| | | /IP | | |MAC| | (Leaf) +---+
| |VRF| | | | | |VRF| |
| | | | | | | | | | +---+
| | | | | | | | +--+----ES3-----+CE3|
| +---+ | +------+ | +---+ | (Leaf) +---+
+---------+ +---------+
Figure 1: Scenario 1
In such scenario, an EVPN PE implementation MAY provide E-TREE
service using topology constraint among the PEs belonging to the same
"topology constraint" is a bit opaque as a term, perhaps "using
tailored BGP RT import/export policies" would be more descriptive
(assuming I understood your intent)
Done. Changed it to “topology constraint tailored by BGP Route Target
(RT) import/export policies"
(I still think that "topology" is not a helpful terme to use here.)
EVI. The purpose of this topology constraint is to avoid having PEs
with only Leaf sites importing and processing BGP MAC routes from
each other. To support such topology constrain in EVPN, two BGP
Route-Targets (RTs) are used for every EVPN Instance (EVI): one RT is
associated with the Root sites and the other is associated with the
Leaf sites. On a per EVI basis, every PE exports the single RT
associated with its type of site(s). Furthermore, a PE with Root
site(s) imports both Root and Leaf RTs, whereas a PE with Leaf
site(s) only imports the Root RT.
The text seems to imply that the above is sufficient to deliver the
service, but I fail to see what would prevent Leaf-to-Leaf traffic
between Leaves bound to the same MAC-VRF (ES2 and ES3 in firgure1).
Shouldn't the text mention the use of a split-horizon in Leaf MAC-VRFs ?
Agree, nice catch!. I changed the first sentence from:
"In such scenario, an EVPN PE implementation MAY provide E-TREE
service using topology constraint among the PEs belonging to the same
EVI."
TO
"In such scenario, topology constraint, provided by BGP Route Target
(RT) import/export policies among the PEs belonging to the same EVI,
can be used to restrict the communications among Leaf PEs."
The sentence above does not address my question in fact, which was about
communication between Leaf ACs (rather than about communication between
Leaf PEs)
Let me restate here, more clearly: I fail to see what would prevent
Leaf-to-Leaf traffic between **ACs** bound to the same MAC-VRF (ES2 and
ES3 in firgure1). Shouldn't the text mention the use of a split-horizon
in Leaf MAC-VRFs ?
(assuming the previous point is resolved:)
With this mechanism above, isn't it possible to have on a given PE,
for a single E-TREE EVI, both Leaves and Roots, as long as distinct
MAC-VRFs are used (one for Leaves and one for Roots) ? (it seems to
me that the assymetric import/export RT would do what is needed to
build an E-TREE, we would just have a particular case where a Leaf
MAC-VRF and a Root MAC-VRF for a given E-TREE end up on a single PE)
That’s not possible because per definition of an EVI, there is only a
single MAC-VRF per EVI for a PE.
Where can I read such a definition ? (the Terminology section in RFC7432
does not say that, unless I'm missing something).
And that seems a completely arbitrary restriction.
(just thinking that a given PE device can be split in two logical
devices show that it can work)
Besides, I don’t understand what good does it do to have two MAC-VRFs
on the same PE (one for Leafs and another for Roots)
Well, the "what is good for" is pretty simple: it means you can have,
just by tailoring the import/export policies like in 2.1, something as
useful as the scenario in 2.2.
because Leafs and Roots need to talk to each other and thus we want
them to be in the same MAC-VRF.
The fact that Leafs and Roots need to talk to each other does not mean
that they *have* to be in the same MAC-VRF, you can rely on the local
MPLS dataplane inside the PE to carry the traffic between Roots and
Leaves can be passed between a Leaf MAC-VRF and a Root MAC-VRF (and you
can possibly implement a shortcut not involving MPLS encap/decap).
However, Leafs should not talk among themselves and thus we can put
all the Leaf ACs in a split-horizon group.
Yes, this is the meaning of my initial comment above and it is true
independently of whether or not you consider the possibility of having
both a Roots MAC-VRF and Leaf MAC-VRF on a same PE.
If this is not possible, I think the text should explain why.
I don’t think we need an explanation because of the above reason but
if you think otherwise, then please suggest a text as what do you
think I should add.
Two possibilities:
- if indeed there is no possibility of having, for a given E-Tree, both
a Root MAC-VRF and a Leaf MAC-VRF, on a given PE, then the text only
misses an explanation of why it is not possible - else, if the
possibility exists, then it means that the asymetric RT procedure
currently described in 2.1 are in fact another way of addressing the
scenario supported by 2.2 ("a PE receives traffic from either Root OR
Leaf sites (but not both) on a given Attachment Circuit (AC) of an
EVI.") - so the content of 2.1 and 2.2 would be two approaches for
supporting this scenario and (2.1 --> "Approach A, Root MAC-VRF + Leaf
MAC-VRF, two RTs", and 2.2 -> "Approach B, Root/Leaf MAC-VRF, single RT" )
2.2 Scenario 2: Leaf OR Root site(s) per AC
In this scenario, a PE receives traffic from either Root OR Leaf
sites (but not both) on a given Attachment Circuit (AC) of an EVI. In
other words, an AC (ES or ES/VLAN) is either associated with a Root
or Leaf (but not both).
s/with a Root or Leaf/with Roots or Leaves/ ?
Agree – Changed it to "Root(s) or Leaf(s)"
Re-reading and thinking a bit: "an AC is either a Root AC or a Leaf AC
(but not both)" would be much much clearer ?
or "an AC is either associated as a Root or as a Leaf (but not both)"
perhaps.
(but my initial suggestion wasn't great)
+---------+ +---------+
| PE1 | | PE2 |
+---+ | +---+ | +------+ | +---+ | +---+
|CE1+-----ES1----+--+ | | | | | | +--+---ES2/AC1--+CE2|
+---+ (Leaf) | |MAC| | | MPLS | | |MAC| | (Leaf) +---+
| |VRF| | | /IP | | |VRF| |
| | | | | | | | | | +---+
| | | | | | | | +--+---ES2/AC2--+CE3|
| +---+ | +------+ | +---+ | (Root) +---+
+---------+ +---------+
Figure 2: Scenario 2
In this scenario, if there are PEs with only root (or leaf) sites per
EVI, then the RT constrain procedures described in section 2.1 can
also be used here. However, when a Root site is added to a Leaf PE,
then that PE needs to process MAC routes from all other Leaf PEs and
add them to its forwarding table.
This is the case in 2.1 as well, isn't it ?
It can start as 2.1 but as soon as you add Root site to a Leaf PE,
then it becomes different (per last sentence of the above para).
I guess we need to first conclude the discussion about the section 2.1,
before the above can be discussed efficiently.
For this scenario, if for a given
EVI, the majority of PEs will eventually have both Leaf and Root
sites attached, even though they may start as Root-only or Leaf-only
PEs, then it is recommended to use a single RT per EVI and avoid
additional configuration and operational overhead.
Why this recommendation ?
Even with a majority of PEs having both Leaves and Roots, there can
remain (up to 49% of) PEs having only Leaves, which will uselessly
have all routes to other Leaves.
So "it is recommended" above, deserves to be explained more, I think.
OK, I changed “majority” to “vast majority” :-)
My point was not to nit pick on "majority", but was that you should
explain why you recommend that.
As the text currently reads, the cost of the recommendation can be
identified: having useless routes on the fraction of PEs having only Leaves.
But the gain brought by the recommendation is not even mentioned, not to
say explained.
Hence: why ?
(Why is it a useful tradeoff to have useless routes on some, even if
only one, PE ?)
is on a per MAC address. This scenario is considered in
this draft for EVPN service with only known unicast traffic - i.e.,
there is no BUM traffic.
"there is no BUM" is quite a bold claim ! :=
Maybe the text should say "no BUM traffic is supported (BUM traffic
will be dropped)" ?
(possibly "BUM traffic from Leaves will be dropped" would be sufficient ?)
Changed it to “BUM traffic is not supported in this scenario and it is
dropped”.
adding "by the ingress PE" ?
+---------+ +---------+
| PE1 | | PE2 |
+---+ | +---+ | +------+ | +---+ | +---+
|CE1+-----ES1----+--+ | | | | | | +--+---ES2/AC1--+CE2|
+---+ (Root) | | E | | | MPLS | | | E | | (Leaf/Root)+---+
| | V | | | /IP | | | V | |
| | I | | | | | | I | | +---+
| | | | | | | | +--+---ES2/AC2--+CE3|
| +---+ | +------+ | +---+ | (Leaf) +---+
+---------+ +---------+
Figure 3: Scenario 3
3 Operation for EVPN
[RFC7432] defines the notion of ESI MPLS label used for split-horizon
filtering of BUM traffic at the egress PE. Such egress filtering
capabilities can be leveraged in provision of E-TREE services as seen
shortly. In other words, [RFC7432] has inherent capability to support
E-TREE services without defining any new BGP routes but by just
defining a new BGP Extended Community for leaf indication as shown
later in this document.
3.1 Known Unicast Traffic
Since in EVPN, MAC learning is performed in control plane via
advertisement of BGP routes, the filtering needed by E-TREE service
for known unicast traffic can be performed at the ingress PE, thus
providing very efficient filtering and avoiding sending known unicast
traffic over MPLS/IP core to be filtered at the egress PE as done in
traditional E-TREE solutions (e.g., E-TREE for VPLS).
To provide such ingress filtering for known unicast traffic, a PE
MUST indicate to other PEs what kind of sites (root or leaf) its MAC
addresses are associated with by advertising a leaf indication flag
(via an Extended Community) along with each of its MAC/IP
Advertisement route. The lack of such flag indicates that the MAC
address is associated with a root site.
This scheme applies to all
scenarios described in section 2.
Furthermore, for multi-homing scenario of section 2.2, where an AC is
either root or leaf (but not both), the PE MAY advertise leaf
indication along with the Ethernet A-D per EVI route. This
advertisement is used for sanity checking in control-plane to ensure
that there is no discrepancy in configuration among different PEs of
the same redundancy group. For example, if a leaf site is multi-homed
to PE1 an PE2, and PE1 advertises the Ethernet A-D per EVI
corresponding to this leaf site with the leaf-indication flag but PE2
does not, then the receiving PE notifies the operator of such
discrepancy and ignore the leaf-indication flag on PE1. In other
words, in case of discrepancy, the multi-homing for that pair of PEs
is assumed to be in default "root" mode for that <ESI, EVI> or <ESI,
EVI/VLAN>. The leaf indication flag on Ethernet A-D per EVI route
tells the receiving PEs that all MAC addresses associated with this
<ESI, EVI> or <ESI, EVI/VLAN> are from a leaf site. Therefore, if a
PE receives a leaf indication for an AC via the Ethernet A-D per EVI
route but doesn't receive a leaf indication in the corresponding MAC
route,then it notify the operator and ignore the leaf indication on
the Ethernet A-D per EVI route.
The procedure above should I think be rephrased to provide unambiguous
interpretation in the case where a given MAC is being announced in
more than one MAC/IP advertisement route, possibly carrying a
different leaf indication (and even possibly from different ESes, or
from PEs not advertising Ethernet A-D route).
Are you talking about MAC move where a MAC can move between Root and
Leaf sites? If so, MAC mobility procedure takes precedence. I have
added the following paragraph toward the end of this section:
"In situation where MAC moves are allowed among Leaf and Root sites
(e.g., non-static MAC), PEs can receive multiple MAC/IP advertisements
routes for the same MAC address with different Leaf/Root indications
(and possibly different ESIs for multi-homing scenarios). In such
situations, MAC mobility procedures take precedence to first identify
the location of the MAC before associating that MAC with a Root or a
Leaf site."
Tagging MAC addresses with a leaf indication enables remote PEs to
perform ingress filtering for known unicast traffic - i.e., on the
ingress PE, the MAC destination address lookup yields, in addition to
the forwarding adjacency, a flag which indicates whether the target
MAC is associated with a Leaf site or not.
Ditto, more or less: the procedure above should I think be rephrased
to provide unambiguous interpretation in the case where a given MAC is
being announced in more than one MAC/IP advertisement route, possibly
carrying a different leaf indication.
The new paragraph will take care of it.
The new paragraph takes care of the MAC mobility case, but there
possibly remains the case of a MAC being advertised in two distinct
MAC/IP advertisement route for a same dual-homed ES, in the case where
this ES is flagged as Leaf or Root consistently from the two dual-homing
PEs.
The ingress PE cross-
checks this flag with the status of the originating AC, and if both
are Leafs, then the packet is not forwarded.
To support the above ingress filtering functionality, a new E-TREE
Extended Community with a Leaf indication flag is introduced [section
5.2]. This new Extended Community MUST be advertised with MAC/IP
Advertisement route and MAY be advertised with an Ethernet A-D per
EVI route as described above.
3.2 BUM Traffic
For BUM traffic, it is not possible to perform filtering on the
ingress PE, as is the case with known unicast, because of the multi-
destination nature of the traffic.
Saying "it is not possible" without more explanation is not very
useful (the reader may think about using RPF-like techniques on the
egress PE).
It seems to me more reasonable to formulate things in terms of "This
specification does not provide support for filtering BUM traffic on
the ingress PE", and avoid a sentence like the one above.
OK, Changed the sentence to:
"This specification does not provide support for filtering BUM traffic
on the ingress PE because it is not possible to perform filtering of
BUM traffic on the ingress PE, as is the case with known unicast
described above, due to the multi-destination nature of BUM traffic."
Ok.
As such, the solution relies on
egress filtering. In order to apply the proper egress filtering,
which varies based on whether a packet is sent from a Leaf AC or a
root AC, the MPLS-encapsulated frames MUST be tagged with an
indication when they originated from a Leaf AC. In other words, leaf
indication for BUM traffic is done at the granularity of AC. This can
be achieved in EVPN through the use of a MPLS label where it can be
used to either identify the Ethernet segment of origin per [RFC7432]
(i.e., ESI label) or it can be used to indicate that the packet is
originated from a leaf site (Leaf label).
BUM traffic sent over a P2MP LSP or ingress replication, may need to
carry an upstream assigned or downstream assigned MPLS label
(respectively) for the purpose of egress filtering to indicate to the
egress PEs whether this packet is originated from a leaf AC.
The main difference between downstream and upstream assigned MPLS
label is that in case of downstream assigned not all egress PE
devices need to receive the label just like ingress replication
procedures defined in [RFC7432].
There are four scenarios to consider as follow. In all these
scenarios, the imposition PE imposes the right MPLS label associated
with the originated Ethernet Segment (ES) depending on whether the
Ethernet frame originated from a Root or a Leaf site on that Ethernet
Segment (ESI or Leaf label).
The mechanism by which the PE identifies
whether a given frame originated from a Root or a Leaf site on the
segment is based on the Ethernet Tag associated with the frame (e.g.,
whether the frame received on a leaf or a root AC).
First comment: it seems that the formulation should also support the
case where an AC does not use .1q.
Agree. Change the sentence to:
"The mechanism by which the PE identifies whether a given frame
originated from a Root or a Leaf site on the segment is based on the
AC identifier for that segment (e.g., Ethernet Tag of the frame for
802.1Q frames). Other mechanisms for identifying root or leaf (e.g.,
on a per MAC address basis) is beyond the scope of this document."
Ok.
(side comment: doing the identification based on the source MAC
address would seem to allow BUM in the context of 2.3; it is out of
the scope of my review to extend the scope of these specs, but I'm
curious why it is not proposed....)
If we went for per MAC root/leaf identification, then this would have
expanded the scope of DF election and egress filtering beyond that of
RFC 7432. Currently, we don’t have any such requirements from
operators and service providers.
Does the above mean that scenario 2.3 excludes BUM because the DF
Election mechanism would not be compatible with the egress filtering
mechanism ?
Providing the explanation in 2.3 would I think be helpful.
4.2 BUM Traffic
For BUM traffic, the PEs must perform egress filtering. When a PE
receives a MAC advertisement route (which will be used as a source B-
MAC), it updates its Ethernet Segment egress filtering function
The "its Ethernet Segment egress filtering function" phrase makes it
sounds like we're talking about a wellknown function defined somewhere.
If this is indeed the case, providing a reference would be in order.
If not, then explaining what this function is would be required.
Changed the sentence to:
"When a PE receives a MAC advertisement route (which will be used as a
source B-MAC for BUM traffic), it updates its egress filtering (based
on the source B-MAC address), as follows:"
(Are you talking about doing something similar to what 3.2 specifies
for the non-PBB procedures ?)
Correct. Similar to 3.2 but based on B-MAC address.
Ok.
(based on the source B-MAC address), as follows:
- If the MAC Advertisement route indicates that the advertised B-MAC
is a Leaf, and the local Ethernet Segment is a Leaf as well, then the
source B-MAC address is added to the B-MAC filtering list.
Changed it to:
“… is added to its B-MAC list used for egress filtering."
Implicitly we can guess that this "filtering list" is a list of things
to include, rather than a list of things to include, but the text
should I think be explicit.
Changed it as above.
We still don't know if the list is a list of B-MAC to reject or to accept ?
(filter out what is specified in the list vs. filter to keep only what
is specified in the list)
5.2 PMSI Tunnel Attribute
[RFC6514] defines PMSI Tunnel attribute which is an optional
transitive attribute with the following format:
+---------------------------------+
| Flags (1 octet) |
+---------------------------------+
| Tunnel Type (1 octets) |
+---------------------------------+
| MPLS Label (3 octets) |
+---------------------------------+
| Tunnel Identifier (variable) |
+---------------------------------+
This draft uses all the fields per existing definition except for the
following modifications to the Tunnel Type and Tunnel Identifier:
When receiver ingress-replication label is needed, the high-order bit
of the tunnel type field (C bit - Composite tunnel bit) is set while
the remaining low-order seven bits indicate the tunnel type as
before. When this C bit is set, the "tunnel identifier" field would
begin with a three-octet label, followed by the actual tunnel
identifier for the transmit tunnel. PEs that don't understand the
new meaning of the high-order bit would treat the tunnel type as an
invalid tunnel type. For the PEs that do understand the new meaning
of the high-order, if ingress replication is desired when sending BUM
traffic, the PE will use the the label in the Tunnel Identifier field
when sending its BUM traffic.
Additionally, since RFC7385 has created a registry for PMSI Tunnel
attribute tunnel types, taking the most significant bit from this
field can't be done without a significant change of how this registry
is organized (because now you can't take value in 0x7b-0x7f without
colliding into values which are Experimental or Reserved).
Achieving the above requires an update of RFC7385, so I would suggest
adding an 8.1 section saying this:
---
The "P-Multicast Service Interface Tunnel (PMSI Tunnel) Tunnel Types"
registry in the "Border Gateway Protocol (BGP) Parameters" registry
needs to be updated to reflect the use of the most significant bit to
advertise the use of "composite tunnels" (section 5.2).
For this purpose, this document updates RFC7385.
The registry is to be updated, by removing the entries for 0xFB-0xFE
and 0x0F, and replacing them by:
- 0x7B-0x7E Reserved for Experimental Use [this document]
- 0x7F Reserved [this document]
- 0x80-0xFF Not Allocatable, corresponds to Composite tunnel types
[this document]
The allocation policy for values 0x00 to 0x7A is IETF Review [RFC5226
<https://tools.ietf.org/html/rfc5226>].
The range for experimental use is now 0x7B-0x7E, and value in this
range are not to be assigned.
The status of 0x7F may only be changed through Standards Action
[RFC5226 <https://tools.ietf.org/html/rfc5226>].
Done. Thanks for providing the text. It was very helpful.
Ok.
One thing: in the revised text, line breaks are missing for the bullet
list ("- 0x7B-0x7E Reserved for Experimental Use [this document]- 0x7F
Reserved [this document]- 0x80-0xFF Not Allocatable, corresponds to
Composite tunnel types [this document]").
_______________________________________________
BESS mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/bess