Eric,
> -----Original Message-----
> From: Eric Rosen [mailto:[email protected]]
> Sent: Wednesday, June 12, 2013 3:18 PM
> To: Jeffrey (Zhaohui) Zhang
> Cc: [email protected]
> Subject: Comments on draft-zzhang-l3vpn-mvpn-bidir-ingress-replication-
> 00.txt
>
> This is an interesting and potentially useful draft.
>
> The major issue I see is that while the draft does not say that it is
> applicable only to non-segmented IR P-tunnels, I don't see how it will
> work if IR P-tunnels are segmented at ASBRs or ABRs. The draft seems to use
> the term "Upstream Multicast Hop" (UMH) to mean "Upstream PE", which would
> be okay only if non-segmented IR P-tunnels are out of scope.
I assume you meant "if segmented IR P-tunnels are out of scope".
Indeed segmented P-tunnel case needs further study.
>
> Comments in-line, look for ****.
>
>
> Abstract
>
> RFC 6513 described a method to support bidirectional C-flow using
> "Partial Mesh of MP2MP P-Tunnels". This document describes how
> partial mesh of MP2MP P-Tunnels can be simulated with Ingress
> Replication, instead of a real MP2MP tunnel.
>
> **** I'd add to the abstract that this enables a Service Provider to use
> **** Ingress Replication to offer transparent BIDIR-PIM service to its VPN
> **** customers.
Sure.
> [draft-ietf-l3vpn-mvpn-bidir-05] assumes that an MP2MP P-tunnel is
> realized either via PIM-Bidir, or via MP2MP mLDP. Each of them would
> require signaling and state not just on PEs, but on the P routers as
> well. This document describes how the MP2MP tunnel can be simulated
> with a mesh of P2P or MP2P LSPs, i.e. Ingress Replication.
>
> **** What is really being proposed is to simulate a MP2MP P-tunnel with a
> **** set of P2MP P-tunnels, and then to use Ingress Replication to
> **** instantiate each such P-tunnel. The trick is how to get all the PEs to
> **** join all the necessary P2MP P-tunnels without requiring each PE to send
> **** a Leaf A-D route for each MP2MP P-tunnel to each other PE.
Correct.
>
> The
> advantage is that existing P2P/MP2P LSPs created for unicast can be
> used for multicast as well w/o introducing additional signaling or
> state in the core. While there may be concerns with traffic
> replication in the core, in many situations the traffic could be low-
> rate and/or sporadic and the advantage of signaling and state savings
> will outweight the concerns with traffic replication, making Ingress
> Replication an applicable and attractive alternative.
>
> **** It might be better simply to say that this scheme has both the
> **** advantages and the disadvantages of Ingress Replication in general.
I can do that.
> 3.1. Control State
>
> If a PE, say PEx, is connected to a site of a given VPN, and that
> site hosts the C-RPA for some Bidir-PIM groups, i.e., the route to
> the C-RPA is through a local PE-CE interface,
>
> **** I think the actual condition is that PEx's next hop interface to some
> **** C-RPA is a VRF interface. This is not exactly the same thing as being
> **** connected to a site that "hosts a C-RPA".
OK.
>
> then PEx MUST
> advertises a (C-*,C-BIDIR) S-PMSI A-D route, regardless of whether it
> has any local Bidir-PIM join states corresponding to the C-RPA
> learned from its CEs. It MAY also advertise a (C-*,C-G-BIDIR) S-PMSI
>
> **** "advertise a" --> "advertise one or more"
OK.
>
> A-D route, just like how any other S-PMSI A-D routes are triggered
> (e.g, when the (C-*,C-G-BIDIR) traffic rate goes above a threshold).
>
> **** It's worth pointing out that applying a traffic rate threshold to a
> **** (C-*,C-G-BIDIR) state would require measuring the traffic in both
> **** directions, as the sources are not necessarily local. For IR
> **** P-tunnels, it might also be necessary to take the fanout into account.
I'll do that.
>
> Here the C-G-BIDIR refers to a C-G where G is a Bidir-PIM group, and
> the corresponding C-RPA is in the site that the PEx connects to.
>
> The S-PMSI A-D routes include a Provider Tunnel Attribute (PTA) with
>
> **** "PMSI Tunnel attribute"
I'll fix that.
>
> tunnel type set to Ingress Replication, with Leaf Information
> Required flag set, and with a downstream allocated MPLS label that
> other PEs in the same partition MUST use when sending relevant
> C-bidir flows to this PE.
>
> **** and with the Tunnel Identifier field in the PTA set to a routable
> **** address of the originator?
Yes.
>
> **** Can the MPLS label be shared with any other P-tunnels? Perhaps all
> **** the (C-*,C-BIDIR) and (C-*,C-G-BIDIR) S-PMSI A-D routes originated by a
> **** given PE can (optionally) share a label?
Subject to the anti-ambiguity rules for extranet.
>
> If some other PE, PEy, receives and imports into one of its VRFs such
> a (C-*,C-BIDIR) S-PMSI A-D route,
>
> **** I'm not sure just what is mean by "such a (C-*,C-BIDIR) S-PMSI A-D
> **** route". Does this mean "any (C-*,C-BIDIR) S-PMSI A-D route whose PTA
> **** specifies an IR P-tunnel"?
The wording "such a" means the S-PMSI A-D route mentioned in earlier text
(originated by PEx). Yes, we can use the text you suggested.
>
> and the VRF has any local Bidir-PIM
> join state that PEy has received from its CEs, and if PEy chooses PEx
> as its UMH wrt the C-RPA for those states, PEy MUST advertise a Leaf
> A-D route in response. Or, if PEy has received and imported into one
> of its VRFs a (C-*,C-BIDIR) S-PMSI A-D route from PEx before, then
> upon receiving in the VRF any local Bidir-PIM join state from its CEs
> with PEx being the UMH for those states' C-RPA, PEy MUST advertise a
> Leaf A-D route.
>
> The encoding of the Leaf A-D route is as specified in RFC 6514,
> except that the Route Targets are set to the same value as in the
> corresponding S-PMSI A-D route so that the Leaf A-D route will be
> imported by all VRFs that import the corresponding S-PMSI A-D route.
>
> **** I take the "except" clause to mean that RFC 6514's rules for setting
> **** the Leaf A-D route's RTs are not followed, and that the RTs are instead
> **** just copied from the S-PMSI A-D route. Is that the intention?
Yes.
>
> **** This means that the Leaf A-D route will not have an RT that is created
> **** from the Next Hop or P2MP Segmented Next Hop EC, which essentially
> **** means that all the P-tunnels will be non-segmented. Is that the
> **** intention?
Segmented P-tunnel needs further study.
>
> **** RFC 6514 says that a PE/ASBR should take no action with regard to a
> **** Leaf A-D route unless that Leaf A-D route carries an IP Address
> **** Specific RT identifying the PE/ASBR. This draft should make it very
> **** clear that it is changing the RFC6514 procedures for the case where the
> **** route key of a Leaf A-D route identifies a (C-*,C-BIDIR) or a
> **** (C-*,C-G-BIDIR) S-PMSI.
Good catch. I'll make it clear.
>
> **** It's not clear to me how these procedures would coexist with the
> **** segmentation procedures that ordinarily occur at ABRs or ASBRs. Are
> **** ABRs/ASBRs supposed to modify the next hop and/or segmented next hop
> **** extended communities of the S-PMSI A-D routes that are about BIDIR
> **** groups? If not, but if segmentation is to be applied to S-PMSI A-D
> **** routes that are not about BIDIR groups, how will the ABRs/ASBRS know
> **** which are which? I.e., how exactly do the procedures of this draft
> **** coexist with the ABR/ASBR segmentation procedures that apply to
> **** non-BIDIR S-PMSIs?
Unfortunately, segmented P-tunnel case needs further study.
>
> **** Note that there is up to now no requirement that an S-PMSI A-D route
> **** originated from a particular VRF carry any of that VRF's import RTs. I
> **** think that requirement needs to be added to this draft; otherwise the
> **** Leaf A-D routes originated in response to an S-PMSI A-D route won't
> **** necessarily be imported into the originating VRF of the S-PMSI A-D
> **** route. Alternatively, one could require that the Leaf A-D route carry
> **** an IP Address Specific RT identifying the S-PMSI route's originator (as
> **** learned from the NLRI) in its Global Administrator field.
I don't follow the alternative - all other PEs in the same partition need to
import the Leaf A-D routes, so carrying the RT identifying the S-PMSI route's
originator would not help.
Perhaps I need to explicitly point it out, but the RTs carried by the S-PMSI
A-D routes are just as specified at the end of section 12.1, RFC 6513. The
S-PMSI A-D routes will be imported by all PEs into the right VRFs, and by
copying the RTs to the Leaf-AD routes, the Leaf-AD routes will be imported into
the same set of VRFs?
>
> This is irrespective of whether from a receiving PE, PEz's
> perspective PEx (oiginator of the S-PMSI A-D route) is the UMH PE or
>
> **** Is "PEz" supposed to be "PEy"?
It is really PEz:
PEx originates S-PMSI A-D route; PEy responds with a Leaf A-D route; PEz
receives it.
>
> **** It would be better to say "Upstream PE" than "UMH" or "UMH PE", as the
> **** UMH could presumably be an ABR or ASBR.
Yes.
>
> not. The label in the PTA of the Leaf A-D route originated by PEy
> MUST be allocated specifically for PEx, so that when traffic arrives
> with that label, the traffic can associated with the partition
> (represented by the PEx).
>
> **** This doesn't make it clear whether the label can be shared with other
> **** S-PMSIs (e.g., P2MP S-PMSIs) that originate from PEx. I think the
> **** answer is yes, at least in non-extranet cases.
Correct. I left it unspecified, because it would be covered by the extranet
spec.
>
> **** I think the draft should specify that the originator (the upstream PE)
> **** is identified from the "originating router's IP address" field of the
> **** NLRI of the S-PMSI A-D route.
OK.
>
> With PEy advertising Leaf A-D route only if it chooses the originator
> of the S-PMSI A-D route as its UMH, it won't receive traffic from PEs
>
> **** "UMH" vs. "Upstream PE" again.
Yup :-)
>
> in other partitions, so the label is actually useful only when PEy
> switches to a different UMH - it will stop accepting traffic before
> sending PEs stop sending it traffic (upon the receipt of its Leaf A-D
> route withdrawl).
>
> **** I don't see why it is said that "PEy ... won't receive traffic from PEs
> **** in other partitions". PE1, for example, may choose PE2 as the Upstream
> **** PE for (C-*,C-G1-BIDIR) while choosing PE3 as the Upstream PE for
> **** (C-*,C-G2-BIDIR). PE4 may make the opposite choices. In that case PE1
> **** and PE4 may both originate Leaf A-D routes with NLRI <C-*,C-BIDIR, PE2>
> **** and <C-*,C-BIDIR,PE3>. As a result, PE2 and PE3 would get the
> **** (C-*,C-G1-BIDIR) and (C-*,C-G2-BIDIR) flows from both partitions, and
> **** they would need to use the label to determine which copy of each C-flow
> **** to drop.
You are right. I'll fix it.
>
>
> To speed up convergency (so that PEy starts
> receiving traffic from its new UMH immediately instead of waiting
> until the new Leaf A-D route corresponding to the new UMH is received
> by sending PEs), PEy MAY advertise a Leaf A-D route even if does not
> choose PEx as its UMH wrt the C-RPA. With that, it will receive
> traffic from all PEs, but some will arrive with the label
> corresponding to its choice of UMH while some will arrive with a
> different label, and the traffic in the latter case will be
> discarded.
>
> **** This might be useful as a form of live-live redundancy, but I don't
> **** think it is a very efficient way to speed up convergence when a
> **** receiving PE decides to switch the partition over which it receives a
> **** particular C-flow. When the receiving PE originates the Leaf A-D route
> **** for the new partition, it just has to keep forwarding the C-flow
> **** received from the old partition for a certain period of time, while
> **** discarding that C-flow when received from the new partition; when that
> **** period of time is up, can start discarding the C-flow when received
> **** from the old partition and start forwarding it when received from the
> **** new. (A larger delay should be imposed on transmitters, so they
> **** continue transmitting to a particular receiver for a period of time
> **** after that receiver withdraws its Leaf A-D route.) This would provide
> **** a more efficient "make before break" type of procedure.
I suppose the transmitter would have to send on both the original tunnel
(corresponding to the old UMH) and new tunnel (corresponding to the new UMH)
and then stop after a timeout?
On the other hand, when UMH changes, keep sending/receiving on the old tunnel
may not be a good idea for PIM-bidir from loop-avoidance point of view. I don't
have a concrete example, though.
I can reword the above paragraph to live-live redundancy case.
> Whenever the (C-*,C-BIDIR) or (C-*,C-G-BIDIR) S-PMSI A-D route is
> withdrawn, or if PEy no longer chooses the originator PEx as its UMH
> wrt C-RPA and PEy only advertises Leaf A-D routes in response to its
> UMH's S-PMSI A-D route, or if relevant local join state is pruned,
> PEy MUST withdraw the corresponding Leaf A-D route.
>
> **** I think this "MUST" is too strong, and in fact it contradicts what is
> **** said above "PEy MAY advertise a Leaf A-D route even if does not
> **** choose PEx as its UMH wrt the C-RPA".
>
The above text does have an additional condition (see below), so it does not
conflict?
... and PEy only advertises Leaf A-D routes in response to its
UMH's S-PMSI A-D route, ...
Thanks.
Jeffrey