Eric, Please see my responses to your comments inline marked w/ "Ali>"
On 11/9/17, 9:42 AM, "BESS on behalf of Eric C Rosen" <bess-boun...@ietf.org on behalf of ero...@juniper.net> wrote: I have a number of comments on draft-sajassi-bess-evpn-mvpn-seamless-interop. 1. It seems that the proposal does not do correct ethernet emulation. Intra-subnet multicast only sometimes preserves MAC SA and IP TTL, sometimes not, depending upon the topology. TTL handling for inter-subnet multicast seems inconsistent as well, depending upon the topology. The proposal exposes the operator's internal network structure to the user, and will cause "LAN-only" applications to break. These concerns are acknowledged, then quickly dismissed based on wishful thinking. (In my experience, wishful thinking doesn't work out very well in routing.) Ali> EVPN doesn't provide LAN service per IEEE 802.1Q but rather an emulation of LAN service. This document defines what that emulation means wrt IP multicat traffic for intra-subnet & inter-subnet IP multicast traffic. I added section 5.1 to expand on that. BTW, TTL handling for inter-subnet IP multicast traffic is done consistent! 2. In order to do inter-subnet multicast in EVPN, the proposal requires L3VPN/MVPN configuration on ALL the EVPN PEs. This is required even when there is no need for MVPN/EVPN interworking. This is portrayed as a "low provisioning" solution! Ali> Using MVPN constructs doesn't requires additional configuration on EVPN PEs beyond multicast configuration needed for IRB-mcast operation. 3. The draft claims that the exact same control plane should be used for EVPN and MVPN, despite the fact that MVPN's control plane is unaware of certain information that is very important in EVPN (e.g., EVIs, TagIDs). (This is largely responsible for point 1 above.) This is claimed to be a way of providing a "uniform solution". As we examine the problems that arise, perhaps this will be seen as more a case of "pounding square pegs into round holes". When interworking between two domains, generally one gets a more flexible and robust scheme by maintaining clean interfaces and having well-defined points of attachment, not by entangling the internal protocols of one domain with the internal protocols of the other. Ali> IP multicast described in the draft is done at the tenant's level (IP-VRF) and not BD level !! So, BD level info such as tagIDs are not relevant. 4. The draft proposes to use the same tunnels for MVPN and EVPN, i.e., to have tunnels that traverse both the MVPN and the EVPN domains. Various "requirements" are stated that seem to require this solution. Somewhere along the line it was realized that this requirement cannot be met if MVPN and EVPN do not use the same tunnel types. So for this very common scenario, a completely different solution is proposed, that (a) tries to keep the EVPN control plane out of the MVPN domain, and vice versa, and (b) uses different tunnels in the two domains. Perhaps the "requirements" that suggest using a single cross-domain tunnel are not really requirements! And why would we want different solutions for different deployment scenarios? Yes, the solution needs to handle all the use cases, but we don't want to look at the use cases one at a time and design a different solution for each one. Ali> There are SPDCs with MPLS underlay and there are SPDCs with VxLAN underlay. We need a solution that is optimum for both. Just the same way that we need both ASBR and GWs to optimize connectivity for inter-AS scenarios. While the authors have realized that one cannot have cross-domain tunnels when EVPN uses VxLAN and MVPN uses MPLS, they do not seem to have acknowledged the multitude of other scenarios in which cross-domain tunnels cannot be used. For instance, MVPN may be using mLDP, while EVPN is using IR. Or MVPN may be using RSVP-TE P2MP while EVPN is using AR. Etc., etc. I suspect that "different tunnel types" will be the common case, especially when trying to interwork existing MVPN and EVPN deployments. Ali> This will be captured in the next rev. and that's why the need for both GW and ASBRs. The inability to use EVPN-specific tunnels also causes a number of specific problems when attempting to interwork with MVPN; these will be examined below. 5. A number of the draft's stated "requirements" seem to be entirely bogus. a. In some cases, the "requirements" for optimality in one or another respect (e.g., routing, replication) are really only considerations that an operator should be able to trade off against other considerations. The real requirement is to be able to create a deployment scenario in which such optimality is achievable. Other deployment scenarios, that optimize for other considerations, should not be prohibited. Ali> What deployment scenarios do you think are prohibited ? b. Many of the "requirements" are applied very selectively, e.g., the "requirement" for MVPN and EVPN to use the same set of multicast tunnels, and the requirement for there to be no "gateways". Ali> That has been explained in context of SPDC. 6. The gateway-based proposal for interworking MVPN and EVPN when they use different tunnel types is severely underspecified. Ali> Agreed. This will be covered in the subsequent revisions. One possible approach to this would be to have a single MVPN domain that includes the EVPN PEs, and to use MVPN tunnel segmentation at the boundary. While that is a complicated solution, at least it is known to work. However, that does not seem to be what is being proposed. Ali> It is not clear to me exactly what you are suggesting here. At the boundary, is there any mcast address lookup or not? Another approach would be to set up two independent MVPN domains and carefully assign RTs to ensure that routes are not leaked from one domain to another. One would also have to ensure that the boundary points send the proper set of routes into the "other" domain. (This includes the unicast routes as well as the multicast routes.) And one would have to include a whole bunch of applicability restrictions, such as "don't use the same RR to hold routes of both domains". I think that's what's being proposed, but there isn't enough discussion of RT and RD management to be sure, and there isn't much discussion of what information the boundary points send into each domain. Ali> I will expand on that with the RD and RT management aspects. Both the intension is with a single MVPN domain where both EVPN and MVPN PEs participate. 7. The proposal requires that EVPN export a host route to MVPN for each EVPN-attached multicast source. It's a good thing that there is no requirement like "do not burden existing MVPN deployments with a whole bunch of additional host routes". Wait a minute, maybe there is such a requirement. Ali> :-) In fact, whether the host routes are necessary to achieve optimal routing depends on the topology. And this is a case where an operator might well want to sacrifice some routing optimality to reduce the routing burden on the MVPN nodes. Ali> If there is mobility, then there is host route advertisement :-) If there is no mobility, then prefixes can be advertised. 8. The proposal simply does not work when MVPN receivers are interested in multicast flows from EVPN sources that are attached to all-active multi-homed ethernet segments. Ali> This issue has been addressed in the new revision. This issue is worth examining in detail. Suppose EVPN-PE1 and EVPN-PE2 are both attached to the same ethernet segment, using all-active multi-homing. Suppose there is a multicast source S on that segment. In such a case, (S,G1) traffic might arrive at PE1, while (S,G2) traffic might arrive at PE2. (Which PE gets a particular flow from S depends on LAG hashing algorithms over which we have no control.) Now suppose that an MVPN PE, say PE3, needs to receive (S,G1) traffic. MVPN requires PE3 to select the "Upstream PE" for the (S,G1) traffic. PE3 does this by looking at the VRF Route Import EC on its best route to S. In order to receive the (S,G1) traffic, PE3 must select PE1, rather than PE2, as the Upstream PE. However, there is absolutely nothing in the MVPN specs or in this document to ensure that PE3 selects PE1 rather than PE2. Generally, an MVPN node will select PE2 if it is closer to PE2. Perhaps the authors are under the impression that MVPN Source Active A-D routes can be used to solve this problem. That is not so. Vanilla MVPN nodes do not generally base their selection of the Upstream PE for (S,G) on the SA A-D routes. Let me explain a little about the way SA A-D routes are used. There are two different MVPN "modes" that affect the use of SA A-D routes. In one mode (sometimes known as 'rpt-spt' mode, and described in Section 13 of RFC 6514), an SA A-D route for (S,G) is originated by a PE when that PE receives a C-multicast route for (S,G). In another mode (sometimes known as 'spt-only' mode, and described in Section 14 of RFC 6514), an SA A-D route for (S,G) is originated by a PE when that PE receives a PIM Register message for (S,G), or when that PE receives an MSDP SA message for (S,G). Note that in this mode, the PE originating the SA A-D route is not necessarily the best (or even a good) ingress PE for the flow. - In both modes, if an egress PE receives a PIM Join (S,G) from a CE, its choice of ingress PE is never impacted by the SA A-D routes. Note that CEs send PIM Join(S,G) messages for both ASM and SSM groups. - In spt-only mode, the SA A-D routes are used to discover sources, but not to select the ingress PE. (The selected ingress PE is not necessarily the one originating the SA A-D route.) - The choice of ingress PE is impacted by the SA A-D routes for (S,G) only when (a) rpt-spt mode is being used, (b) the egress PE has received a PIM Join (*,G) from a CE, and (c) the egress PE has not received a PIM Join (S,G) from a CE. This is typically just a transient state, as the CE will generally emit a PIM Join(S,G) as soon as it sees any (S,G) traffic. Bottom line: if a source is on an EVPN all-active multi-homed segment, MVPN receivers have no way to select the proper ingress PE. If the segment is n-way-homed, the MVPN PEs have just a 1/n chance of getting the traffic. Of course, this problem could be eliminated if EVPN and MVPN didn't have to use the same tunnels. In that case, if an MVPN node selects the wrong ingress PE, the selected PE could obtain the traffic from the real ingress PE, and then relay it to the MVPN node. This might result in sub-optimal routing, but that's better than a black hole! Perhaps the gateway-based solution needs to be used whenever there is all-active multi-homing? ;-) One could imagine modifying the MVPN installed based so that the SA A-D routes play more of a role in selecting the Upstream PE. However, I believe the requirement is to allow MVPN/EVPN interworking without modifying the existing MVPN nodes. 9. In the case where all the multicast sources for a given group are attached via EVPN, there is a very simple procedure for providing Join(*,G) functionality. This procedure makes use of EVPN-specific knowledge. Since the MVPN protocols cannot take advantage of the EVPN-specific knowledge, a more complicated procedure is needed when only MVPN protocols are used. This is explained further in the in-line comments. 10. Most of the problems above are the result of (a) trying to use the exact same control plane for both MVPN and EVPN, and (b) treating the case where both domains use the same tunnel type as the design center. It would be better to keep clean interfaces between EVPN and MVPN, with clearly defined points of attachment. The proposal in draft-lin-bess-evpn-irb-mcast does this, and thus does not run into the above problems. That proposal also shows how the "optimal routing" requirements can be met, and how they can be traded off against other considerations. (In fairness, it must be acknowledged that both proposals are still works in progress. It's also worth noting that the two proposals have a lot in common.) Ali> The proposal in evpn-irb-mcast is not ruled out. A number of additional comments can be found in-line in the attachment. (I realize that some of them are repetitive, sorry.) Look for lines beginning "****". The above comments are also repeated at the front of the attachment. Ali> I will go over your additional comments and address them separately. Cheers, Ali _______________________________________________ BESS mailing list BESS@ietf.org https://www.ietf.org/mailman/listinfo/bess