Re: [bess] shepherd review of draft-ietf-bess-evpn-etree

Thomas Morin Fri, 02 Sep 2016 08:11:43 -0700

Hi Ali,

Thanks for the quick respin, which covers many of the points.


(inlined below, skipping the resolved points)

2016-09-02, Ali Sajassi (sajassi):

   sites albeit for different EVIs.


                   +---------+            +---------+
                   |   PE1   |            |   PE2   |
    +---+          |  +---+  |  +------+  |  +---+ |            +---+
    |CE1+---ES1----+--+   |  |  | MPLS |  |  | +--+----ES2-----+CE2|
    +---+  (Root)  |  |MAC|  |  |  /IP |  |  |MAC|  | (Leaf)   +---+
                   |  |VRF|  |  |      |  |  |VRF|  |
                   |  |   |  |  |      |  |  |   | |            +---+
                   |  |   |  |  |      |  |  | +--+----ES3-----+CE3|
                   |  +---+  |  +------+  |  +---+  | (Leaf)   +---+
                   +---------+            +---------+

   Figure 1: Scenario 1

   In such scenario, an EVPN PE implementation MAY provide E-TREE
   service using topology constraint among the PEs belonging to the same

"topology constraint" is a bit opaque as a term, perhaps "usingtailored BGP RT import/export policies" would be more descriptive(assuming I understood your intent)

Done. Changed it to “topology constraint tailored by BGP Route Target(RT) import/export policies"


(I still think that "topology" is not a helpful terme to use here.)

   EVI. The purpose of this topology constraint is to avoid having PEs
   with only  Leaf sites importing and processing BGP MAC routes from
   each other. To support such topology constrain in EVPN, two BGP
   Route-Targets (RTs) are used for every EVPN Instance (EVI): one RT is
   associated with the Root sites and the other is associated with the
   Leaf sites. On a per EVI basis, every PE exports the single RT
   associated with its type of site(s). Furthermore, a PE with Root
   site(s) imports both Root and Leaf RTs, whereas a PE with Leaf
   site(s) only imports the Root RT.
The text seems to imply that the above is sufficient to deliver theservice, but I fail to see what would prevent Leaf-to-Leaf trafficbetween Leaves bound to the same MAC-VRF (ES2 and ES3 in firgure1).Shouldn't the text mention the use of a split-horizon in Leaf MAC-VRFs ?
Agree, nice catch!. I changed the first sentence from:
"In such scenario, an EVPN PE implementation MAY provide E-TREEservice using topology constraint among the PEs belonging to the sameEVI."
TO
"In such scenario, topology constraint, provided by BGP Route Target(RT) import/export policies among the PEs belonging to the same EVI,can be used to restrict the communications among Leaf PEs."

The sentence above does not address my question in fact, which was aboutcommunication between Leaf ACs (rather than about communication betweenLeaf PEs)Let me restate here, more clearly: I fail to see what would preventLeaf-to-Leaf traffic between **ACs** bound to the same MAC-VRF (ES2 andES3 in firgure1). Shouldn't the text mention the use of a split-horizonin Leaf MAC-VRFs ?

(assuming the previous point is resolved:)
With this mechanism above, isn't it possible to have on a given PE,for a single E-TREE EVI, both Leaves and Roots, as long as distinctMAC-VRFs are used (one for Leaves and one for Roots) ? (it seems tome that the assymetric import/export RT would do what is needed tobuild an E-TREE, we would just have a particular case where a LeafMAC-VRF and a Root MAC-VRF for a given E-TREE end up on a single PE)
That’s not possible because per definition of an EVI, there is only asingle MAC-VRF per EVI for a PE.

Where can I read such a definition ? (the Terminology section in RFC7432does not say that, unless I'm missing something).

And that seems a completely arbitrary restriction.

(just thinking that a given PE device can be split in two logicaldevices show that it can work)

Besides, I don’t understand what good does it do to have two MAC-VRFson the same PE (one for Leafs and another for Roots)

Well, the "what is good for" is pretty simple: it means you can have,just by tailoring the import/export policies like in 2.1, something asuseful as the scenario in 2.2.

because Leafs and Roots need to talk to each other and thus we wantthem to be in the same MAC-VRF.

The fact that Leafs and Roots need to talk to each other does not meanthat they *have* to be in the same MAC-VRF, you can rely on the localMPLS dataplane inside the PE to carry the traffic between Roots andLeaves can be passed between a Leaf MAC-VRF and a Root MAC-VRF (and youcan possibly implement a shortcut not involving MPLS encap/decap).

However, Leafs should not talk among themselves and thus we can putall the Leaf ACs in a split-horizon group.

Yes, this is the meaning of my initial comment above and it is trueindependently of whether or not you consider the possibility of havingboth a Roots MAC-VRF and Leaf MAC-VRF on a same PE.

If this is not possible, I think the text should explain why.
I don’t think we need an explanation because of the above reason butif you think otherwise, then please suggest a text as what do youthink I should add.


Two possibilities:

- if indeed there is no possibility of having, for a given E-Tree, botha Root MAC-VRF and a Leaf MAC-VRF, on a given PE, then the text onlymisses an explanation of why it is not possible - else, if thepossibility exists, then it means that the asymetric RT procedurecurrently described in 2.1 are in fact another way of addressing thescenario supported by 2.2 ("a PE receives traffic from either Root ORLeaf sites (but not both) on a given Attachment Circuit (AC) of anEVI.") - so the content of 2.1 and 2.2 would be two approaches forsupporting this scenario and (2.1 --> "Approach A, Root MAC-VRF + LeafMAC-VRF, two RTs", and 2.2 -> "Approach B, Root/Leaf MAC-VRF, single RT" )


2.2 Scenario 2: Leaf OR Root site(s) per AC

   In this scenario, a PE receives traffic from either Root OR Leaf
   sites (but not both) on a given Attachment Circuit (AC) of an EVI. In
   other words, an AC (ES or ES/VLAN) is either associated with a Root
   or Leaf (but not both).


s/with a Root or Leaf/with Roots or Leaves/ ?

Agree – Changed it to "Root(s) or Leaf(s)"

Re-reading and thinking a bit: "an AC is either a Root AC or a Leaf AC(but not both)" would be much much clearer ?or "an AC is either associated as a Root or as a Leaf (but not both)"perhaps.

(but my initial suggestion wasn't great)


                     +---------+            +---------+
                     |   PE1   |            |   PE2   |
    +---+            |  +---+  |  +------+  |  +---+ |            +---+
    |CE1+-----ES1----+--+   |  |  |      |  |  | +--+---ES2/AC1--+CE2|
    +---+    (Leaf)  |  |MAC|  |  | MPLS |  |  |MAC|  | (Leaf)   +---+
                     |  |VRF|  |  |  /IP |  |  |VRF|  |
                     |  |   |  |  |      |  |  |   | |            +---+
                     |  |   |  |  |      |  |  | +--+---ES2/AC2--+CE3|
                     |  +---+  |  +------+  |  +---+  | (Root)   +---+
                     +---------+            +---------+

   Figure 2: Scenario 2

   In this scenario, if there are PEs with only root (or leaf) sites per
   EVI, then the RT constrain procedures described in section 2.1 can
   also be used here. However, when a Root site is added to a Leaf PE,
   then that PE needs to process MAC routes from all other Leaf PEs and

add them to its forwarding table.


This is the case in 2.1 as well, isn't it ?

It can start as 2.1 but as soon as you add Root site to a Leaf PE,then it becomes different (per last sentence of the above para).

I guess we need to first conclude the discussion about the section 2.1,before the above can be discussed efficiently.

For this scenario, if for a given
   EVI, the majority of PEs will eventually have both Leaf and Root
   sites attached, even though they may start as Root-only or Leaf-only
   PEs, then it is recommended to use a single RT per EVI and avoid
   additional configuration and operational overhead.


Why this recommendation ?

Even with a majority of PEs having both Leaves and Roots, there canremain (up to 49% of) PEs having only Leaves, which will uselesslyhave all routes to other Leaves.


So "it is recommended" above, deserves to be explained more, I think.

OK, I changed “majority” to “vast majority” :-)

My point was not to nit pick on "majority", but was that you shouldexplain why you recommend that.As the text currently reads, the cost of the recommendation can beidentified: having useless routes on the fraction of PEs having only Leaves.But the gain brought by the recommendation is not even mentioned, not tosay explained.

Hence: why ?

(Why is it a useful tradeoff to have useless routes on some, even ifonly one, PE ?)

is on a per MAC address. This scenario is considered in
   this draft for EVPN service with only known unicast traffic - i.e.,
   there is no BUM traffic.
"there is no BUM" is quite a bold claim ! :=
Maybe the text should say "no BUM traffic is supported (BUM trafficwill be dropped)" ?
(possibly "BUM traffic from Leaves will be dropped" would be sufficient ?)
Changed it to “BUM traffic is not supported in this scenario and it isdropped”.


adding "by the ingress PE" ?


                     +---------+            +---------+
                     |   PE1   |            |   PE2   |
    +---+            |  +---+  |  +------+  |  +---+ |            +---+
    |CE1+-----ES1----+--+   |  |  |      |  |  | +--+---ES2/AC1--+CE2|
    +---+    (Root)  |  | E |  |  | MPLS |  |  | E |  | (Leaf/Root)+---+
                     |  | V |  |  |  /IP |  |  | V |  |
                     |  | I |  |  |      |  |  | I | |            +---+
                     |  |   |  |  |      |  |  | +--+---ES2/AC2--+CE3|
                     |  +---+  |  +------+  |  +---+  | (Leaf)   +---+
                     +---------+            +---------+

   Figure 3: Scenario 3

3 Operation for EVPN

   [RFC7432] defines the notion of ESI MPLS label used for split-horizon
   filtering of BUM traffic at the egress PE. Such egress filtering
   capabilities can be leveraged in provision of E-TREE services as seen
   shortly. In other words, [RFC7432] has inherent capability to support
   E-TREE services without defining any new BGP routes but by just
   defining a new BGP Extended Community for leaf indication as shown
   later in this document.

3.1 Known Unicast Traffic

   Since in EVPN, MAC learning is performed in control plane via
   advertisement of BGP routes, the filtering needed by E-TREE service
   for known unicast traffic can be performed at the ingress PE, thus
   providing very efficient filtering and avoiding sending known unicast
   traffic over MPLS/IP core to be filtered at the egress PE as done in
   traditional E-TREE solutions (e.g., E-TREE for VPLS).

   To provide such ingress filtering for known unicast traffic, a PE
   MUST indicate to other PEs what kind of sites (root or leaf) its MAC
   addresses are associated with by advertising a leaf indication flag
   (via an Extended Community) along with each of its MAC/IP
   Advertisement route. The lack of such flag indicates that the MAC

address is associated with a root site.

  This scheme applies to all
   scenarios described in section 2.

   Furthermore, for multi-homing scenario of section 2.2, where an AC is
   either root or leaf (but not both), the PE MAY advertise leaf
   indication along with the Ethernet A-D per EVI route. This
   advertisement is used for sanity checking in control-plane to ensure
   that there is no discrepancy in configuration among different PEs of
   the same redundancy group. For example, if a leaf site is multi-homed
   to PE1 an PE2, and PE1 advertises the Ethernet A-D per EVI
   corresponding to this leaf site with the leaf-indication flag but PE2
   does not, then the receiving PE notifies the operator of such
   discrepancy and ignore the leaf-indication flag on PE1. In other
   words, in case of discrepancy, the multi-homing for that pair of PEs
   is assumed to be in default "root" mode for that <ESI, EVI> or <ESI,
   EVI/VLAN>. The leaf indication flag on Ethernet A-D per EVI route
   tells the receiving PEs that all MAC addresses associated with this
   <ESI, EVI> or <ESI, EVI/VLAN> are from a leaf site. Therefore, if a
   PE receives a leaf indication for an AC via the Ethernet A-D per EVI
   route but doesn't receive a leaf indication in the corresponding MAC
   route,then it notify the operator and ignore the leaf indication on
the Ethernet A-D per EVI route.

The procedure above should I think be rephrased to provide unambiguousinterpretation in the case where a given MAC is being announced inmore than one MAC/IP advertisement route, possibly carrying adifferent leaf indication (and even possibly from different ESes, orfrom PEs not advertising Ethernet A-D route).

Are you talking about MAC move where a MAC can move between Root andLeaf sites? If so, MAC mobility procedure takes precedence. I haveadded the following paragraph toward the end of this section:"In situation where MAC moves are allowed among Leaf and Root sites(e.g., non-static MAC), PEs can receive multiple MAC/IP advertisementsroutes for the same MAC address with different Leaf/Root indications(and possibly different ESIs for multi-homing scenarios). In suchsituations, MAC mobility procedures take precedence to first identifythe location of the MAC before associating that MAC with a Root or aLeaf site."


   Tagging MAC addresses with a leaf indication enables remote PEs to
   perform ingress filtering for known unicast traffic - i.e., on the
   ingress PE, the MAC destination address lookup yields, in addition to
   the forwarding adjacency, a flag which indicates whether the target

MAC is associated with a Leaf site or not.

Ditto, more or less: the procedure above should I think be rephrasedto provide unambiguous interpretation in the case where a given MAC isbeing announced in more than one MAC/IP advertisement route, possiblycarrying a different leaf indication.


The new paragraph will take care of it.

The new paragraph takes care of the MAC mobility case, but therepossibly remains the case of a MAC being advertised in two distinctMAC/IP advertisement route for a same dual-homed ES, in the case wherethis ES is flagged as Leaf or Root consistently from the two dual-homingPEs.

The ingress PE cross-
   checks this flag with the status of the originating AC, and if both
   are Leafs, then the packet is not forwarded.
   To support the above ingress filtering functionality, a new E-TREE
   Extended Community with a Leaf indication flag is introduced [section
   5.2]. This new Extended Community MUST be advertised with MAC/IP
   Advertisement route and MAY be advertised with an Ethernet A-D per
   EVI route as described above.

3.2 BUM Traffic

   For BUM traffic, it is not possible to perform filtering on the
   ingress PE, as is the case with known unicast, because of the multi-
   destination nature of the traffic.
Saying "it is not possible" without more explanation is not veryuseful (the reader may think about using RPF-like techniques on theegress PE).It seems to me more reasonable to formulate things in terms of "Thisspecification does not provide support for filtering BUM traffic onthe ingress PE", and avoid a sentence like the one above.
OK, Changed the sentence to:
"This specification does not provide support for filtering BUM trafficon the ingress PE because it is not possible to perform filtering ofBUM traffic on the ingress PE, as is the case with known unicastdescribed above, due to the multi-destination nature of BUM traffic."

Ok.

As such, the solution relies on
   egress filtering. In order to apply the proper egress filtering,
   which varies based on whether a packet is sent from a Leaf AC or a
   root AC, the MPLS-encapsulated frames MUST be tagged with an
   indication when they originated from a Leaf AC. In other words, leaf
   indication for BUM traffic is done at the granularity of AC. This can
   be achieved in EVPN through the use of a MPLS label where it can be
   used to either identify the Ethernet segment of origin per [RFC7432]
   (i.e., ESI label) or it can be used to indicate that the packet is
   originated from a leaf site (Leaf label).

   BUM traffic sent over a P2MP LSP or ingress replication, may need to
   carry an upstream assigned or downstream assigned MPLS label
   (respectively) for the purpose of egress filtering to indicate to the
   egress PEs whether this packet is originated from a leaf AC.

   The main difference between downstream and upstream assigned MPLS
   label is that in case of downstream assigned not all egress PE
   devices need to receive the label just like ingress replication
   procedures defined in [RFC7432].

   There are four scenarios to consider as follow. In all these
   scenarios, the imposition PE imposes the right MPLS label associated
   with the originated Ethernet Segment (ES) depending on whether the
   Ethernet frame originated from a Root or a Leaf site on that Ethernet

Segment (ESI or Leaf label).

The mechanism by which the PE identifies
   whether a given frame originated from a Root or a Leaf site on the
   segment is based on the Ethernet Tag associated with the frame (e.g.,

whether the frame received on a leaf or a root AC).

First comment: it seems that the formulation should also support thecase where an AC does not use .1q.


Agree. Change the sentence to:

"The mechanism by which the PE identifies whether a given frameoriginated from a Root or a Leaf site on the segment is based on theAC identifier for that segment (e.g., Ethernet Tag of the frame for802.1Q frames). Other mechanisms for identifying root or leaf (e.g.,on a per MAC address basis) is beyond the scope of this document."

Ok.

(side comment: doing the identification based on the source MACaddress would seem to allow BUM in the context of 2.3; it is out ofthe scope of my review to extend the scope of these specs, but I'mcurious why it is not proposed....)
If we went for per MAC root/leaf identification, then this would haveexpanded the scope of DF election and egress filtering beyond that ofRFC 7432. Currently, we don’t have any such requirements fromoperators and service providers.

Does the above mean that scenario 2.3 excludes BUM because the DFElection mechanism would not be compatible with the egress filteringmechanism ?

Providing the explanation in 2.3 would I think be helpful.

4.2 BUM Traffic

   For BUM traffic, the PEs must perform egress filtering. When a PE
   receives a MAC advertisement route (which will be used as a source B-
   MAC), it updates its Ethernet Segment egress filtering function
The "its Ethernet Segment egress filtering function" phrase makes itsounds like we're talking about a wellknown function defined somewhere.
If this is indeed the case, providing a reference would be in order.
If not, then explaining what this function is would be required.

Changed the sentence to:
"When a PE receives a MAC advertisement route (which will be used as asource B-MAC for BUM traffic), it updates its egress filtering (basedon the source B-MAC address), as follows:"
(Are you talking about doing something similar to what 3.2 specifiesfor the non-PBB procedures ?)
Correct. Similar to 3.2 but based on B-MAC address.

Ok.

   (based on the source B-MAC address), as follows:

   - If the MAC Advertisement route indicates that the advertised B-MAC
   is a Leaf, and the local Ethernet Segment is a Leaf as well, then the
   source B-MAC address is added to the B-MAC filtering list.

Changed it to:
“… is added to its B-MAC list used for egress filtering."

Implicitly we can guess that this "filtering list" is a list of thingsto include, rather than a list of things to include, but the textshould I think be explicit.


Changed it as above.


We still don't know if the list is a list of B-MAC to reject or to accept ?

(filter out what is specified in the list vs. filter to keep only whatis specified in the list)

5.2 PMSI Tunnel Attribute

   [RFC6514] defines PMSI Tunnel attribute which is an optional
   transitive attribute with the following format:

         +---------------------------------+
         |  Flags (1 octet)                |
         +---------------------------------+
         |  Tunnel Type (1 octets)         |
         +---------------------------------+
         |  MPLS Label (3 octets)          |
         +---------------------------------+
         |  Tunnel Identifier (variable)   |
         +---------------------------------+

   This draft uses all the fields per existing definition except for the
   following modifications to the Tunnel Type and Tunnel Identifier:

   When receiver ingress-replication label is needed, the high-order bit
   of the tunnel type field (C bit - Composite tunnel bit) is set while
   the remaining low-order seven bits indicate the tunnel type as
   before. When this C bit is set, the "tunnel identifier" field would
   begin with a three-octet label, followed by the actual tunnel
   identifier for the transmit tunnel.  PEs that don't understand the
   new meaning of the high-order bit would treat the tunnel type as an
   invalid tunnel type. For the PEs that do understand the new meaning
   of the high-order, if ingress replication is desired when sending BUM
   traffic, the PE will use the the label in the Tunnel Identifier field
   when sending its BUM traffic.
Additionally, since RFC7385 has created a registry for PMSI Tunnelattribute tunnel types, taking the most significant bit from thisfield can't be done without a significant change of how this registryis organized (because now you can't take value in 0x7b-0x7f withoutcolliding into values which are Experimental or Reserved).
Achieving the above requires an update of RFC7385, so I would suggestadding an 8.1 section saying this:
---
The "P-Multicast Service Interface Tunnel (PMSI Tunnel) Tunnel Types"registry in the "Border Gateway Protocol (BGP) Parameters" registryneeds to be updated to reflect the use of the most significant bit toadvertise the use of "composite tunnels" (section 5.2).
For this purpose, this document updates RFC7385.
The registry is to be updated, by removing the entries for 0xFB-0xFEand 0x0F, and replacing them by:
- 0x7B-0x7E Reserved for Experimental Use [this document]
- 0x7F  Reserved [this document]
- 0x80-0xFF Not Allocatable, corresponds to Composite tunnel types[this document]
The allocation policy for values 0x00 to 0x7A is IETF Review [RFC5226<https://tools.ietf.org/html/rfc5226>].The range for experimental use is now 0x7B-0x7E, and value in thisrange are not to be assigned.The status of 0x7F may only be changed through Standards Action[RFC5226 <https://tools.ietf.org/html/rfc5226>].
Done. Thanks for providing the text. It was very helpful.

Ok.

One thing: in the revised text, line breaks are missing for the bulletlist ("- 0x7B-0x7E Reserved for Experimental Use [this document]- 0x7FReserved [this document]- 0x80-0xFF Not Allocatable, corresponds toComposite tunnel types [this document]").

_______________________________________________
BESS mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/bess

Re: [bess] shepherd review of draft-ietf-bess-evpn-etree

Reply via email to