RE: Comments on draft-zzhang-l3vpn-mvpn-bidir-ingress-replication-00.txt

Jeffrey (Zhaohui) Zhang Tue, 16 Jul 2013 13:14:10 -0700

Eric,

> -----Original Message-----
> From: Eric Rosen [mailto:[email protected]]
> Sent: Wednesday, June 12, 2013 3:18 PM
> To: Jeffrey (Zhaohui) Zhang
> Cc: [email protected]
> Subject: Comments on draft-zzhang-l3vpn-mvpn-bidir-ingress-replication-
> 00.txt
> 
> This is an interesting and potentially useful draft.
> 
> The major issue I see is that while the draft does not say that it is
> applicable only to non-segmented IR P-tunnels, I don't see how it will
> work if IR P-tunnels are segmented at ASBRs or ABRs.  The draft seems to use
> the term "Upstream Multicast Hop" (UMH) to mean "Upstream PE", which would
> be okay only if non-segmented IR P-tunnels are out of scope.


I assume you meant "if segmented IR P-tunnels are out of scope".

Indeed segmented P-tunnel case needs further study.

> 
> Comments in-line, look for ****.
> 
> 
> Abstract
> 
>    RFC 6513 described a method to support bidirectional C-flow using
>    "Partial Mesh of MP2MP P-Tunnels".  This document describes how
>    partial mesh of MP2MP P-Tunnels can be simulated with Ingress
>    Replication, instead of a real MP2MP tunnel.
> 
> **** I'd add to the abstract that this enables a Service Provider to use
> **** Ingress Replication to offer transparent BIDIR-PIM service to its VPN
> **** customers.

Sure.

>    [draft-ietf-l3vpn-mvpn-bidir-05] assumes that an MP2MP P-tunnel is
>    realized either via PIM-Bidir, or via MP2MP mLDP.   Each of them would
>    require signaling and state not just on PEs, but on the P routers as
>    well.  This document describes how the MP2MP tunnel can be simulated
>    with a mesh of P2P or MP2P LSPs, i.e.  Ingress Replication.
> 
> **** What is really being proposed is to simulate a MP2MP P-tunnel with a
> **** set of P2MP P-tunnels, and then to use Ingress Replication to
> **** instantiate each such P-tunnel.  The trick is how to get all the PEs to
> **** join all the necessary P2MP P-tunnels without requiring each PE to send
> **** a Leaf A-D route for each MP2MP P-tunnel to each other PE.

Correct.

> 
>    The
>    advantage is that existing P2P/MP2P LSPs created for unicast can be
>    used for multicast as well w/o introducing additional signaling or
>    state in the core.  While there may be concerns with traffic
>    replication in the core, in many situations the traffic could be low-
>    rate and/or sporadic and the advantage of signaling and state savings
>    will outweight the concerns with traffic replication, making Ingress
>    Replication an applicable and attractive alternative.
> 
> **** It might be better simply to say that this scheme has both the
> **** advantages and the disadvantages of Ingress Replication in general.

I can do that.

> 3.1.  Control State
> 
>    If a PE, say PEx, is connected to a site of a given VPN, and that
>    site hosts the C-RPA for some Bidir-PIM groups, i.e., the route to
>    the C-RPA is through a local PE-CE interface,
> 
> **** I think the actual condition is that PEx's next hop interface to some
> **** C-RPA is a VRF interface.  This is not exactly the same thing as being
> **** connected to a site that "hosts a C-RPA".

OK.

> 
>    then PEx MUST
>    advertises a (C-*,C-BIDIR) S-PMSI A-D route, regardless of whether it
>    has any local Bidir-PIM join states corresponding to the C-RPA
>    learned from its CEs.  It MAY also advertise a (C-*,C-G-BIDIR) S-PMSI
> 
> **** "advertise a" --> "advertise one or more"

OK.

> 
>    A-D route, just like how any other S-PMSI A-D routes are triggered
>    (e.g, when the (C-*,C-G-BIDIR) traffic rate goes above a threshold).
> 
> **** It's worth pointing out that applying a traffic rate threshold to a
> **** (C-*,C-G-BIDIR) state would require measuring the traffic in both
> **** directions, as the sources are not necessarily local.  For IR
> **** P-tunnels, it might also be necessary to take the fanout into account.

I'll do that.

> 
>    Here the C-G-BIDIR refers to a C-G where G is a Bidir-PIM group, and
>    the corresponding C-RPA is in the site that the PEx connects to.
> 
>    The S-PMSI A-D routes include a Provider Tunnel Attribute (PTA) with
> 
> **** "PMSI Tunnel attribute"

I'll fix that.

> 
>    tunnel type set to Ingress Replication, with Leaf Information
>    Required flag set, and with a downstream allocated MPLS label that
>    other PEs in the same partition MUST use when sending relevant
>    C-bidir flows to this PE.
> 
> **** and with the Tunnel Identifier field in the PTA set to a routable
> **** address of the originator?

Yes.

> 
> **** Can the MPLS label be shared with any other P-tunnels?  Perhaps all
> **** the (C-*,C-BIDIR) and (C-*,C-G-BIDIR) S-PMSI A-D routes originated by a
> **** given PE can (optionally) share a label?

Subject to the anti-ambiguity rules for extranet.

> 
>    If some other PE, PEy, receives and imports into one of its VRFs such
>    a (C-*,C-BIDIR) S-PMSI A-D route,
> 
> **** I'm not sure just what is mean by "such a (C-*,C-BIDIR) S-PMSI A-D
> **** route".  Does this mean "any (C-*,C-BIDIR) S-PMSI A-D route whose PTA
> **** specifies an IR P-tunnel"?

The wording "such a" means the S-PMSI A-D route mentioned in earlier text 
(originated by PEx). Yes, we can use the text you suggested.

> 
>    and the VRF has any local Bidir-PIM
>    join state that PEy has received from its CEs, and if PEy chooses PEx
>    as its UMH wrt the C-RPA for those states, PEy MUST advertise a Leaf
>    A-D route in response.  Or, if PEy has received and imported into one
>    of its VRFs a (C-*,C-BIDIR) S-PMSI A-D route from PEx before, then
>    upon receiving in the VRF any local Bidir-PIM join state from its CEs
>    with PEx being the UMH for those states' C-RPA, PEy MUST advertise a
>    Leaf A-D route.
> 
>    The encoding of the Leaf A-D route is as specified in RFC 6514,
>    except that the Route Targets are set to the same value as in the
>    corresponding S-PMSI A-D route so that the Leaf A-D route will be
>    imported by all VRFs that import the corresponding S-PMSI A-D route.
> 
> **** I take the "except" clause to mean that RFC 6514's rules for setting
> **** the Leaf A-D route's RTs are not followed, and that the RTs are instead
> **** just copied from the S-PMSI A-D route.  Is that the intention?

Yes.

> 
> **** This means that the Leaf A-D route will not have an RT that is created
> **** from the Next Hop or P2MP Segmented Next Hop EC, which essentially
> **** means that all the P-tunnels will be non-segmented.  Is that the
> **** intention?

Segmented P-tunnel needs further study.

> 
> **** RFC 6514 says that a PE/ASBR should take no action with regard to a
> **** Leaf A-D route unless that Leaf A-D route carries an IP Address
> **** Specific RT identifying the PE/ASBR.  This draft should make it very
> **** clear that it is changing the RFC6514 procedures for the case where the
> **** route key of a Leaf A-D route identifies a (C-*,C-BIDIR) or a
> **** (C-*,C-G-BIDIR) S-PMSI.

Good catch. I'll make it clear.

> 
> **** It's not clear to me how these procedures would coexist with the
> **** segmentation procedures that ordinarily occur at ABRs or ASBRs. Are
> **** ABRs/ASBRs supposed to modify the next hop and/or segmented next hop
> **** extended communities of the S-PMSI A-D routes that are about BIDIR
> **** groups?  If not, but if segmentation is to be applied to S-PMSI A-D
> **** routes that are not about BIDIR groups, how will the ABRs/ASBRS know
> **** which are which?  I.e., how exactly do the procedures of this draft
> **** coexist with the ABR/ASBR segmentation procedures that apply to
> **** non-BIDIR S-PMSIs?

Unfortunately, segmented P-tunnel case needs further study.

> 
> **** Note that there is up to now no requirement that an S-PMSI A-D route
> **** originated from a particular VRF carry any of that VRF's import RTs.  I
> **** think that requirement needs to be added to this draft; otherwise the
> **** Leaf A-D routes originated in response to an S-PMSI A-D route won't
> **** necessarily be imported into the originating VRF of the S-PMSI A-D
> **** route.  Alternatively, one could require that the Leaf A-D route carry
> **** an IP Address Specific RT identifying the S-PMSI route's originator (as
> **** learned from the NLRI) in its Global Administrator field.

I don't follow the alternative - all other PEs in the same partition need to 
import the Leaf A-D routes, so carrying the RT identifying the S-PMSI route's 
originator would not help.

Perhaps I need to explicitly point it out, but the RTs carried by the S-PMSI 
A-D routes are just as specified at the end of section 12.1, RFC 6513. The 
S-PMSI A-D routes will be imported by all PEs into the right VRFs, and by 
copying the RTs to the Leaf-AD routes, the Leaf-AD routes will be imported into 
the same set of VRFs?

> 
>    This is irrespective of whether from a receiving PE, PEz's
>    perspective PEx (oiginator of the S-PMSI A-D route) is the UMH PE or
> 
> **** Is "PEz" supposed to be "PEy"?

It is really PEz:

PEx originates S-PMSI A-D route; PEy responds with a Leaf A-D route; PEz 
receives it.

> 
> **** It would be better to say "Upstream PE" than "UMH" or "UMH PE", as the
> **** UMH could presumably be an ABR or ASBR.

Yes.

> 
>    not.  The label in the PTA of the Leaf A-D route originated by PEy
>    MUST be allocated specifically for PEx, so that when traffic arrives
>    with that label, the traffic can associated with the partition
>    (represented by the PEx).
> 
> **** This doesn't make it clear whether the label can be shared with other
> **** S-PMSIs (e.g., P2MP S-PMSIs) that originate from PEx.  I think the
> **** answer is yes, at least in non-extranet cases.

Correct. I left it unspecified, because it would be covered by the extranet 
spec.

> 
> **** I think the draft should specify that the originator (the upstream PE)
> **** is identified from the "originating router's IP address" field of the
> **** NLRI of the S-PMSI A-D route.

OK.

> 
>    With PEy advertising Leaf A-D route only if it chooses the originator
>    of the S-PMSI A-D route as its UMH, it won't receive traffic from PEs
> 
> **** "UMH" vs. "Upstream PE" again.

Yup :-)

> 
>    in other partitions, so the label is actually useful only when PEy
>    switches to a different UMH - it will stop accepting traffic before
>    sending PEs stop sending it traffic (upon the receipt of its Leaf A-D
>    route withdrawl).
> 
> **** I don't see why it is said that "PEy ... won't receive traffic from PEs
> **** in other partitions".  PE1, for example, may choose PE2 as the Upstream
> **** PE for (C-*,C-G1-BIDIR) while choosing PE3 as the Upstream PE for
> **** (C-*,C-G2-BIDIR).  PE4 may make the opposite choices.  In that case PE1
> **** and PE4 may both originate Leaf A-D routes with NLRI <C-*,C-BIDIR, PE2>
> **** and <C-*,C-BIDIR,PE3>.  As a result, PE2 and PE3 would get the
> **** (C-*,C-G1-BIDIR) and (C-*,C-G2-BIDIR) flows from both partitions, and
> **** they would need to use the label to determine which copy of each C-flow
> **** to drop.

You are right. I'll fix it.

> 
> 
>    To speed up convergency (so that PEy starts
>    receiving traffic from its new UMH immediately instead of waiting
>    until the new Leaf A-D route corresponding to the new UMH is received
>    by sending PEs), PEy MAY advertise a Leaf A-D route even if does not
>    choose PEx as its UMH wrt the C-RPA.  With that, it will receive
>    traffic from all PEs, but some will arrive with the label
>    corresponding to its choice of UMH while some will arrive with a
>    different label, and the traffic in the latter case will be
>    discarded.
> 
> **** This might be useful as a form of live-live redundancy, but I don't
> **** think it is a very efficient way to speed up convergence when a
> **** receiving PE decides to switch the partition over which it receives a
> **** particular C-flow.  When the receiving PE originates the Leaf A-D route
> **** for the new partition, it just has to keep forwarding the C-flow
> **** received from the old partition for a certain period of time, while
> **** discarding that C-flow when received from the new partition; when that
> **** period of time is up, can start discarding the C-flow when received
> **** from the old partition and start forwarding it when received from the
> **** new.  (A larger delay should be imposed on transmitters, so they
> **** continue transmitting to a particular receiver for a period of time
> **** after that receiver withdraws its Leaf A-D route.)  This would provide
> **** a more efficient "make before break" type of procedure.

I suppose the transmitter would have to send on both the original tunnel 
(corresponding to the old UMH) and new tunnel (corresponding to the new UMH) 
and then stop after a timeout?

On the other hand, when UMH changes, keep sending/receiving on the old tunnel 
may not be a good idea for PIM-bidir from loop-avoidance point of view. I don't 
have a concrete example, though.

I can reword the above paragraph to live-live redundancy case.

>    Whenever the (C-*,C-BIDIR) or (C-*,C-G-BIDIR) S-PMSI A-D route is
>    withdrawn, or if PEy no longer chooses the originator PEx as its UMH
>    wrt C-RPA and PEy only advertises Leaf A-D routes in response to its
>    UMH's S-PMSI A-D route, or if relevant local join state is pruned,
>    PEy MUST withdraw the corresponding Leaf A-D route.
> 
> **** I think this "MUST" is too strong, and in fact it contradicts what is
> **** said above "PEy MAY advertise a Leaf A-D route even if does not
> **** choose PEx as its UMH wrt the C-RPA".
> 

The above text does have an additional condition (see below), so it does not 
conflict?

     ... and PEy only advertises Leaf A-D routes in response to its
     UMH's S-PMSI A-D route, ...

Thanks.
Jeffrey

RE: Comments on draft-zzhang-l3vpn-mvpn-bidir-ingress-replication-00.txt

Reply via email to