Comments on draft-zzhang-l3vpn-mvpn-bidir-ingress-replication-00.txt

Eric Rosen Wed, 12 Jun 2013 12:18:57 -0700

This is an interesting and potentially useful draft.

The major issue I see is that while the draft does not say that it is
applicable only to non-segmented IR P-tunnels, I don't see how it will work
if IR P-tunnels are segmented at ASBRs or ABRs.  The draft seems to use the
term "Upstream Multicast Hop" (UMH) to mean "Upstream PE", which would be
okay only if non-segmented IR P-tunnels are out of scope.


Comments in-line, look for ****.


Abstract

   RFC 6513 described a method to support bidirectional C-flow using
   "Partial Mesh of MP2MP P-Tunnels".  This document describes how
   partial mesh of MP2MP P-Tunnels can be simulated with Ingress
   Replication, instead of a real MP2MP tunnel.

**** I'd add to the abstract that this enables a Service Provider to use
**** Ingress Replication to offer transparent BIDIR-PIM service to its VPN
**** customers.



1.  Introduction

   Section 11.2 of RFC 6513, "Partitioned Sets of PEs", describes two
   methods of carrying bidirectional C-flow traffic over a provider core
   without using the core as RPL or requiring Designated Forwarder
   election.

   With these two methods, all PEs of a particular VPN are separated
   into partitions, with each partition being all the PEs that elect the
   same PE as the UMH wrt the C-RPA.   A PE must discard bidirectional
   C-flow traffic from PEs that are not in the same partition as the PE
   itself.

   In particular, Section 11.2.3 of RFC 6513, "Partial Mesh of MP2MP
   P-Tunnels", guarantees the above discard havavior without using an
   extra PE Distinguisher label by having all PEs in the same partition
   join a single MP2MP tunnel dedicated to that partition and use it to
   transmit traffic.  All traffic arriving on the tunnel will be from
   PEs in the same partition, so it will be always accepted.

   RFC 6514 specifies BGP encodings and procedures used to implement
   MVPN as specified in RFC 6513, while the details related to MP2MP
   tunnels are specified in [draft-ietf-l3vpn-mvpn-bidir-05].

   [draft-ietf-l3vpn-mvpn-bidir-05] assumes that an MP2MP P-tunnel is
   realized either via PIM-Bidir, or via MP2MP mLDP.   Each of them would
   require signaling and state not just on PEs, but on the P routers as
   well.  This document describes how the MP2MP tunnel can be simulated
   with a mesh of P2P or MP2P LSPs, i.e.  Ingress Replication.

**** What is really being proposed is to simulate a MP2MP P-tunnel with a
**** set of P2MP P-tunnels, and then to use Ingress Replication to
**** instantiate each such P-tunnel.  The trick is how to get all the PEs to
**** join all the necessary P2MP P-tunnels without requiring each PE to send
**** a Leaf A-D route for each MP2MP P-tunnel to each other PE.

   The
   advantage is that existing P2P/MP2P LSPs created for unicast can be
   used for multicast as well w/o introducing additional signaling or
   state in the core.  While there may be concerns with traffic
   replication in the core, in many situations the traffic could be low-
   rate and/or sporadic and the advantage of signaling and state savings
   will outweight the concerns with traffic replication, making Ingress
   Replication an applicable and attractive alternative.

**** It might be better simply to say that this scheme has both the
**** advantages and the disadvantages of Ingress Replication in general. 

   This documentation specifies the BGP signaling and procedures used to
   simulate "Partial Mesh of MP2MP P-Tunnels" with Ingress Replication.

...


3.  Operation

3.1.  Control State

   If a PE, say PEx, is connected to a site of a given VPN, and that
   site hosts the C-RPA for some Bidir-PIM groups, i.e., the route to
   the C-RPA is through a local PE-CE interface,

**** I think the actual condition is that PEx's next hop interface to some
**** C-RPA is a VRF interface.  This is not exactly the same thing as being
**** connected to a site that "hosts a C-RPA".

   then PEx MUST
   advertises a (C-*,C-BIDIR) S-PMSI A-D route, regardless of whether it
   has any local Bidir-PIM join states corresponding to the C-RPA
   learned from its CEs.  It MAY also advertise a (C-*,C-G-BIDIR) S-PMSI

**** "advertise a" --> "advertise one or more"   
   
   A-D route, just like how any other S-PMSI A-D routes are triggered
   (e.g, when the (C-*,C-G-BIDIR) traffic rate goes above a threshold).

**** It's worth pointing out that applying a traffic rate threshold to a
**** (C-*,C-G-BIDIR) state would require measuring the traffic in both
**** directions, as the sources are not necessarily local.  For IR
**** P-tunnels, it might also be necessary to take the fanout into account.
   
   Here the C-G-BIDIR refers to a C-G where G is a Bidir-PIM group, and
   the corresponding C-RPA is in the site that the PEx connects to.

   The S-PMSI A-D routes include a Provider Tunnel Attribute (PTA) with

**** "PMSI Tunnel attribute"
   
   tunnel type set to Ingress Replication, with Leaf Information
   Required flag set, and with a downstream allocated MPLS label that
   other PEs in the same partition MUST use when sending relevant
   C-bidir flows to this PE.

**** and with the Tunnel Identifier field in the PTA set to a routable
**** address of the originator?

**** Can the MPLS label be shared with any other P-tunnels?  Perhaps all
**** the (C-*,C-BIDIR) and (C-*,C-G-BIDIR) S-PMSI A-D routes originated by a
**** given PE can (optionally) share a label?
   
   If some other PE, PEy, receives and imports into one of its VRFs such
   a (C-*,C-BIDIR) S-PMSI A-D route,

**** I'm not sure just what is mean by "such a (C-*,C-BIDIR) S-PMSI A-D
**** route".  Does this mean "any (C-*,C-BIDIR) S-PMSI A-D route whose PTA
**** specifies an IR P-tunnel"?  

   and the VRF has any local Bidir-PIM
   join state that PEy has received from its CEs, and if PEy chooses PEx
   as its UMH wrt the C-RPA for those states, PEy MUST advertise a Leaf
   A-D route in response.  Or, if PEy has received and imported into one
   of its VRFs a (C-*,C-BIDIR) S-PMSI A-D route from PEx before, then
   upon receiving in the VRF any local Bidir-PIM join state from its CEs
   with PEx being the UMH for those states' C-RPA, PEy MUST advertise a
   Leaf A-D route.

   The encoding of the Leaf A-D route is as specified in RFC 6514,
   except that the Route Targets are set to the same value as in the
   corresponding S-PMSI A-D route so that the Leaf A-D route will be
   imported by all VRFs that import the corresponding S-PMSI A-D route.

**** I take the "except" clause to mean that RFC 6514's rules for setting
**** the Leaf A-D route's RTs are not followed, and that the RTs are instead
**** just copied from the S-PMSI A-D route.  Is that the intention?

**** This means that the Leaf A-D route will not have an RT that is created
**** from the Next Hop or P2MP Segmented Next Hop EC, which essentially
**** means that all the P-tunnels will be non-segmented.  Is that the
**** intention?

**** RFC 6514 says that a PE/ASBR should take no action with regard to a
**** Leaf A-D route unless that Leaf A-D route carries an IP Address
**** Specific RT identifying the PE/ASBR.  This draft should make it very
**** clear that it is changing the RFC6514 procedures for the case where the
**** route key of a Leaf A-D route identifies a (C-*,C-BIDIR) or a
**** (C-*,C-G-BIDIR) S-PMSI.

**** It's not clear to me how these procedures would coexist with the
**** segmentation procedures that ordinarily occur at ABRs or ASBRs.  Are
**** ABRs/ASBRs supposed to modify the next hop and/or segmented next hop
**** extended communities of the S-PMSI A-D routes that are about BIDIR
**** groups?  If not, but if segmentation is to be applied to S-PMSI A-D
**** routes that are not about BIDIR groups, how will the ABRs/ASBRS know
**** which are which?  I.e., how exactly do the procedures of this draft
**** coexist with the ABR/ASBR segmentation procedures that apply to
**** non-BIDIR S-PMSIs?
   
**** Note that there is up to now no requirement that an S-PMSI A-D route
**** originated from a particular VRF carry any of that VRF's import RTs.  I
**** think that requirement needs to be added to this draft; otherwise the
**** Leaf A-D routes originated in response to an S-PMSI A-D route won't
**** necessarily be imported into the originating VRF of the S-PMSI A-D
**** route.  Alternatively, one could require that the Leaf A-D route carry
**** an IP Address Specific RT identifying the S-PMSI route's originator (as
**** learned from the NLRI) in its Global Administrator field.
   
   This is irrespective of whether from a receiving PE, PEz's
   perspective PEx (oiginator of the S-PMSI A-D route) is the UMH PE or

**** Is "PEz" supposed to be "PEy"?   

**** It would be better to say "Upstream PE" than "UMH" or "UMH PE", as the
**** UMH could presumably be an ABR or ASBR.

   not.  The label in the PTA of the Leaf A-D route originated by PEy
   MUST be allocated specifically for PEx, so that when traffic arrives
   with that label, the traffic can associated with the partition
   (represented by the PEx).

**** This doesn't make it clear whether the label can be shared with other
**** S-PMSIs (e.g., P2MP S-PMSIs) that originate from PEx.  I think the
**** answer is yes, at least in non-extranet cases.

**** I think the draft should specify that the originator (the upstream PE)
**** is identified from the "originating router's IP address" field of the
**** NLRI of the S-PMSI A-D route.

   With PEy advertising Leaf A-D route only if it chooses the originator
   of the S-PMSI A-D route as its UMH, it won't receive traffic from PEs

**** "UMH" vs. "Upstream PE" again.

   in other partitions, so the label is actually useful only when PEy
   switches to a different UMH - it will stop accepting traffic before
   sending PEs stop sending it traffic (upon the receipt of its Leaf A-D
   route withdrawl).

**** I don't see why it is said that "PEy ... won't receive traffic from PEs
**** in other partitions".  PE1, for example, may choose PE2 as the Upstream
**** PE for (C-*,C-G1-BIDIR) while choosing PE3 as the Upstream PE for
**** (C-*,C-G2-BIDIR).  PE4 may make the opposite choices.  In that case PE1
**** and PE4 may both originate Leaf A-D routes with NLRI <C-*,C-BIDIR, PE2>
**** and <C-*,C-BIDIR,PE3>.  As a result, PE2 and PE3 would get the
**** (C-*,C-G1-BIDIR) and (C-*,C-G2-BIDIR) flows from both partitions, and
**** they would need to use the label to determine which copy of each C-flow
**** to drop.


   To speed up convergency (so that PEy starts
   receiving traffic from its new UMH immediately instead of waiting
   until the new Leaf A-D route corresponding to the new UMH is received
   by sending PEs), PEy MAY advertise a Leaf A-D route even if does not
   choose PEx as its UMH wrt the C-RPA.  With that, it will receive
   traffic from all PEs, but some will arrive with the label
   corresponding to its choice of UMH while some will arrive with a
   different label, and the traffic in the latter case will be
   discarded.

**** This might be useful as a form of live-live redundancy, but I don't
**** think it is a very efficient way to speed up convergence when a
**** receiving PE decides to switch the partition over which it receives a
**** particular C-flow.  When the receiving PE originates the Leaf A-D route
**** for the new partition, it just has to keep forwarding the C-flow
**** received from the old partition for a certain period of time, while
**** discarding that C-flow when received from the new partition; when that
**** period of time is up, can start discarding the C-flow when received
**** from the old partition and start forwarding it when received from the
**** new.  (A larger delay should be imposed on transmitters, so they
**** continue transmitting to a particular receiver for a period of time
**** after that receiver withdraws its Leaf A-D route.)  This would provide
**** a more efficient "make before break" type of procedure.


   Similar to the (C-*,C-BIDIR) case, if PEy receives and imports into
   one of its VRFs such a (C-*,C-G-BIDIR) S-PMSI A-D route, and PEy
   chooses PEx as its UMH wrt the C-RPA, and it has corresponding local
   (C-*,C-G-BIDIR) join state that it has received from its CEs in the
   VRF, PEy MUST advertise a Leaf A-D route in response.  Or, if PEy has
   received and imported into one of its VRFs a (C-*,C-G-BIDIR) S-PMSI
   A-D route before, then upon receiving its local (C-*,C-G-BIDIR) join
   state from its CEs in the VRF, it MUST advertise a Leaf A-D route.

   The encoding of the Leaf A-D route is as specified in RFC 6514,
   except that the Route Targets are set to the same as in the
   corresponding S-PMSI A-D route so that the Leaf A-D route will be
   imported by all VRFs that import the corresponding S-PMSI A-D route.

**** See prior comment re RTs.
   
   This is irrespective of whether from the receiving PE, PEz's

**** PEz?  Is that supposed to be PEy?

   perspective PEx (oiginator of the S-PMSI A-D route) is the UMH PE or
   not.  The label in the PTA of the Leaf A-D route originated by PEy
   MUST be allocated specifically for PEx, so that when traffic arrives
   with that label, the traffic can associated with the partition
   (represented by the PEx).

**** See prior comment on label granularity.   

   Whenever the (C-*,C-BIDIR) or (C-*,C-G-BIDIR) S-PMSI A-D route is
   withdrawn, or if PEy no longer chooses the originator PEx as its UMH
   wrt C-RPA and PEy only advertises Leaf A-D routes in response to its
   UMH's S-PMSI A-D route, or if relevant local join state is pruned,
   PEy MUST withdraw the corresponding Leaf A-D route.

**** I think this "MUST" is too strong, and in fact it contradicts what is
**** said above "PEy MAY advertise a Leaf A-D route even if does not
**** choose PEx as its UMH wrt the C-RPA".

Comments on draft-zzhang-l3vpn-mvpn-bidir-ingress-replication-00.txt

Reply via email to