Re: [bess] Comments on draft-sajassi-bess-evpn-mvpn-seamless-interop

Kesavan Thiruvenkatasamy (kethiruv) Mon, 15 Jul 2019 11:29:25 -0700

Hi Eric,

Thanks for  your comments.   Please see the inline responses below.

Regards,
Kesavan

From: BESS <[email protected]> on behalf of Eric C Rosen 
<[email protected]>
Date: Monday, September 10, 2018 at 10:39 AM
To: "Ali Sajassi (sajassi)" <[email protected]>, Bess WG <[email protected]>
Subject: Re: [bess] Comments on draft-sajassi-bess-evpn-mvpn-seamless-interop

Eric> 1. It seems that the proposal does not do correct ethernet emulation.  
Intra-subnet multicast only sometimes preserves MAC SA and IP TTL, sometimes 
not, depending upon the topology.

Ali> EVPN doesn't provide LAN service per IEEE 802.1Q but rather an emulation 
of LAN service. This document defines what that emulation means

The fact that the proposal doesn't do correct ethernet emulation cannot be 
resolved by having the proposal redefine "emulation" to mean "whatever the 
proposal does".

EVPN needs to ensure that whatever works on a real ethernet will work on the 
emulated ethernet as well; the externally visible service characteristics on 
which the higher layers may depend must be properly offered by the emulation.  
This applies to both unicast and multicast equally.

Otherwise anyone attempting to replace a real ethernet with EVPN will find that 
not every application and/or protocol working on the real ethernet will 
continue to work on the EVPN.

Kesavan>>  A solution has been proposed  in the new revision to preserve MAC-SA 
and IP TTL for intra-subnet  traffic.

Eric> TTL handling for inter-subnet multicast seems inconsistent as well, 
depending upon the topology.

Ali> BTW, TTL handling for inter-subnet IP multicast traffic is done consistent!

Consider the following in a pure MVPN environment:

- Source S is on subnet1, which is attached to PE1.

- Receivers R1 and R2 are on subnet2, which is attached to both PE1 and PE2.

- Subnet1 and subnet2 are different subnets.

Now every (S,G) packet will follow the same path: either (a) 
subnet1-->PE1-->subnet2 or (b) subnet1-->PE1-->PE2-->subnet2.

Both paths cannot be used at the same time, because L3 multicast will not allow 
both PE1 and PE2 to transmit the (S,G) flow to subnet2.  So an (S,G) packet 
received by R1 will always have the same TTL as the same packet received by R2. 
 TTL scoping will therefore work consistently; depending on the routing, and 
from the perspective of any given flow, the two subnets are either one hop away 
from each other, or two hops away from each other.

In the so-called "seamless-mcast" scheme, on the other hand, if R1 and R2 get 
the same (S,G) packet, each may see a different TTL.  Suppose R1 is on an ES 
attached to PE1 but not to PE2, S is on an ES attached to PE1 but not to PE2, 
and R2 is on an ES attached to PE2 but not to PE1.  Then a given (S,G) packet 
received by R1 will have a smaller TTL than the same packet received by R2, 
even though R1 and R2 are on the same subnet.

Note that the seamless-mcast proposal does not provide the behavior that would 
be provided by MVPN, despite the claim that it is "just MVPN".

This user-visible inconsistency may break any use of TTL scoping, and is just 
the sort of thing that tends generate a stream of service calls from customers 
that pay attention to this sort of stuff.

In general, TTL should be decremented by 0 for intra-subnet and by 1 (within 
the EVPN domain) for inter-subnet.  Failure to handle the TTL decrement 
properly will break anything that depends upon RFC 3682 ("The Generalized TTL 
Security Mechanism").  Have you concluded that no use of multicast together 
with RFC 3682 will, now or in the future, ever need to run over EVPN?  I'd like 
to know how that conclusion is supported.  You may also wish to do a google 
search for "multicast ttl scoping".

A related issue is that the number of PEs through which a packet passes should 
not be inferrable by a tenant.  Any sort of multicast traceroute tool used by a 
tenant will give unexpected results if TTL is not handled properly; at the very 
least this will result in service calls.

The OISM proposal (as described in the irb-mcast draft) will decrement TTL by 1 
when packets go from one subnet to another, as an IP multicast frame is 
distributed unchanged to the PEs that need it, and its TTL is decremented by 1 
if an egress PE needs to deliver it to a subnet other than its source subnet.

Kesavan>>  TTL is handled very similar to inter-subnet unicast traffic.  ( In 
EVPN-IRB model,  TTL will get decremented once for hosts that are attached to 
same PE.  TTL. will get decremented twice, if hosts are connected behind two 
different PEs.)

The draft still makes the following peculiar claim:

   "Based on past experiences with MVPN over last dozen years for supported IP 
multicast applications, layer-3 forwarding of intra-subnet multicast traffic 
should be fine."

Since MVPN does not do intra-subnet multicast, experience with MVPN has no 
bearing whatsoever on the needs of intra-subnet multicast.

Kesavan>>    The above mentioned statement has been removed in the new revision.

Eric> 2. In order to do inter-subnet multicast in EVPN, the proposal requires 
L3VPN/MVPN configuration on ALL the EVPN PEs.  This is required even when there 
is no need for MVPN/EVPN interworking. This is portrayed as a "low 
provisioning" solution!

Ali> Using MVPN constructs doesn't requires additional configuration on EVPN 
PEs beyond multicast configuration needed for IRB-mcast operation.

I think you'll find that if you don't reconfigure all the BGP sessions to carry 
AFI/SAFIs 1/128, 2/128, 1/5, and 2/5, you'll have quite a bit of trouble 
running any of the native MVPN procedures ;-) This is perhaps the simplest 
example of additional configuration that is needed.

If doing MVPN/EVPN interworking, one needs to go to every EVPN PE and set up 
all the RTs used to control the distribution of routes within the L3VPN domain. 
 One has to consider whether the RDs already used by EVPN are distinct from the 
RDs already used by L3VPN.  One has to enable the tunneling mechanisms that are 
used in the L3VPN domain (hopefully the EVPN PEs can support those tunneling 
techniques).  If the L3VPN deployment has been set up with particular routing 
policies (special communities carried, or whatever), these need to be 
configured on every EVPN PE.  One needs to take account of whether the L3VPN 
deployment uses segmented P-tunnels or non-segmented P-tunnels, and whether it 
depends upon the use of (C-*,C-*) S-PMSI A-D routes or not.  One needs to 
configure whether the L3VPN is expecting procedures of RFC6514 Section 13 
("rpt-spt") or whether it is expecting procedures of RFC6514 Section 14 
("spt-only").  I think there are quite a few other configuration items (various 
timers, and additional stuff that I probably don't even know about) that may 
need to be coordinated with the L3VPN deployment with which one is attempting 
to interwork.

To do interworking between EVPN and L3VPN/MVPN, the L3VPN/MVPN stuff obviously 
needs to be configured at the interworking points.  The REQUIREMENT to do ALL 
this configuration at EVERY single EVPN PE is what seems excessive.

Even if one is not doing MVPN/EVPN interworking, all this stuff still has to be 
configured; one just wouldn't have to worry about in that case about 
coordinating with a pre-existing MVPN deployment.  But no one ever called L3VPN 
a "low provisioning solution".   EVPN (unlike MVPN), has a fair amount of 
auto-provisioning built-in, and one loses the advantages of that if one has to 
do MVPN provisioning on every PE.

Kesavan>>  L3VPN configuration is not needed, if there is no need for MVPN/EVPN 
internetworking. But, MVPN configuration is still required .
BTW auto-provisioning can still be used.  MVPN config and RT can be 
auto-configured in the EVPN fabric.

Eric> 3. The draft claims that the exact same control plane should be used for 
EVPN and MVPN, despite the fact that MVPN's control plane is unaware of certain 
information that is very important in EVPN (e.g., EVIs, TagIDs).

Ali> IP multicast described in the draft is done at the tenant's level (IP-VRF) 
and not BD level !! So, BD level info such as tagIDs are not relevant.

The failure to carry BD level info is what causes the ethernet emulation to be 
done incorrectly.  Remember that most of EVPN is taken from L3VPN, with 
modifications to add stuff that is needed to correctly emulate the ethernet 
service.

Certainly if you look at the control plane used by EVPN to distribute unicast 
IP addresses, you'll see that it does not "just use L3VPN", but instead has 
lots of EVPN-specific stuff.

It's also worth pointing out that the draft does not really use the exact same 
control plane as MVPN, as it seems to require that each IP host address be 
advertised in two routes (an EVPN-specific route and a VPN-IP route), and the 
EVPN-specific routes (types 2 or 5) are now required to carry attributes that 
are typically carried only by the VPN-IP routes.  Also, there are the intra-ES 
tunnels (discussed below), something that doesn't exist in MVPN.  And then 
there are those under-specified EVPN-specific 'gateways' (discussed below) that 
are used to connect tunnels of different types.

Kesavan>> W.r.t multicast route handling, same signaling procedures are used 
between EVPN and MVPN PEs. Yes, there are additional changes that are required 
in the EVPN control plane . (Even OISM proposes changes in the existing EVPN 
control plane to accommodate OISM solution)

Eric> 4. The draft proposes to use the same tunnels for MVPN and EVPN, i.e., to 
have tunnels that traverse both the MVPN and the EVPN domains.  Various 
"requirements" are stated that seem to require this solution.  Somewhere along 
the line it was realized that this requirement cannot be met if MVPN and EVPN 
do not use the same tunnel types.  So for this very common scenario, a 
completely different solution is proposed, that (a) tries to keep the EVPN 
control plane out of the MVPN domain, and vice versa, and (b) uses different 
tunnels in the two domains.  Perhaps the "requirements" that suggest using a 
single cross-domain tunnel are not really requirements!

Ali> There are SPDCs with MPLS underlay and there are SPDCs with VxLAN 
underlay. We need a solution that is optimum for both. Just the same way that 
we need both ASBR and GWs to optimize connectivity for inter-AS scenarios.

My point is that the document states "requirements", but applies them very 
selectively and very inconsistently.  There is a "requirement" to "use the same 
tunnels for MVPN and EVPN", but there are many deployment scenarios in which 
this "requirement" simply cannot be met.  If the "requirement" were stated as 
"only use tunnels that provide value",  I'd have no problem with it.  It seems 
that many of the specified requirements were reverse engineered from the 
solution as it was originally proposed, and then are silently ignored whenever 
it is discovered that they can't be met.

Kesavan>>  The requirement section has been updated such that optimum 
replication shall be provided when both technology use the same tunnel type.

Eric> 5a. In some cases, the "requirements" for optimality in one or another 
respect (e.g., routing, replication) are really only considerations that an 
operator should be able to trade off against other considerations.  The real 
requirement is to be able to create a deployment scenario in which such 
optimality is achievable.  Other deployment scenarios, that optimize for other 
considerations, should not be prohibited.

Ali> What deployment scenarios do you think are prohibited ?

The draft does not appear support scenarios in which the MVPN/EVPN interworking 
procedures are confined to a subset of the EVPN PEs, and not even visible to 
the majority of the EVPN PEs.

Kesavan>> The latest revision covers above mentioned use case.

Eric> While the authors have realized that one cannot have cross-domain tunnels 
when EVPN uses VxLAN and MVPN uses MPLS, they do not seem to have acknowledged 
the multitude of other scenarios in which cross-domain tunnels cannot be used.  
For instance, MVPN may be using mLDP, while EVPN is using IR.  Or EVPN may be 
using "Assisted Replication", which does not exist in MVPN.  Or MVPN may be 
using PIM while EVPN is using RSVP-TE P2MP.  Etc., etc.  I suspect that 
"different tunnel types" will be the common case, especially when trying to 
interwork existing MVPN and EVPN deployments.

I note that the latest rev of the draft still does not take this into account.

Eric> The gateway-based proposal for interworking MVPN and EVPN when they use 
different tunnel types is severely underspecified.

Ali> Agreed. This will be covered in the subsequent revisions.

It doesn't seem to be in the latest revision.

Eric> One possible approach to this would be to have a single MVPN domain that 
includes the EVPN PEs, and to use MVPN tunnel segmentation at the boundary. 
While that is a complicated solution, at least it is known to work. However, 
that does not seem to be what is being proposed.

Ali> It is not clear to me exactly what you are suggesting here. At the 
boundary, is there any mcast address lookup or not?

If I were working on a proposal like the one in seamless-multicast, I would 
consider whether the MVPN inter-AS segmented P-tunnels feature could be 
leveraged at the border nodes between domains that use different tunnel types.  
After all, one of the main purposes of MVPN inter-AS segmentation is to connect 
domains that use different tunnel types.  Done properly, that does not require 
any IP lookups at the ASBRs.  The draft seems to be trying to reinvent MVPN 
P-tunnel segmentation from scratch.  This is a very intricate part of the MVPN 
specs and you can't just make it up as you go along.

Here is just a selection of some of the problems with section 10.1 ("Control 
Plane Interconnect") of the -02 revision:

- Much of the document seems to assume that the RTs used in the MVPN domain 
will be the same as the RTs used in the EVPN domain.  If that is the case, all 
the A-D routes from one domain will propagate into the other.  This does not 
appear to be compatible with the sketchy description of "gateway" behavior 
given in Section 10.

- Section 10.1 states that the RD in a Source Active A-D route needs to be 
changed when a such a route is re-originated by a gateway.  Unfortunately, MVPN 
requires that the SA A-D route for (S,G) have the same RD as the unicast route 
for S.  So you would need to block all the IPVPN routes at the gateway and 
reoriginate them with new RDs.  The spec fails to mention this.  Note that this 
is not even possible if the EVPN PEs share the RTs of the MVPN domain.

- Interesting effects could arise if an EVPN PE chooses a gateway as the UMH, 
but the gateway chooses an EVPN PE as the UMH.  Can you demonstrate that this 
is impossible?

- Section 10.1 says that the C-multicast routes originated by the gateway carry 
the "exported RT list on the IP-VRF".  In MVPN, C-multicast routes do not carry 
the exported RT list, they carry an RT created from the VRF Route Import EC of 
the Selected UMH route.

- Section 10.1 talks about putting the BGP Encapsulation EC on the C-multicast 
routes sent into the MVPN domain.  However, MVPN does not make any use of this 
EC.

- Section 10.1 states that the S-PMSI A-D routes just propagate from one domain 
to the other, but with some unspecified "modifications".

- Leaf A-D routes are not discussed at all, nor is the setting of the LIR flag 
in the PMSI Tunnel attribute.

- Inter-AS I-PMSI A-D routes are not discussed.

This section is still severely underspecified.  It seems to be inventing a new 
way of interconnecting two L3VPN/MVPN domains, but it's not "option A", "option 
B", or "option C", and it's not "segmented P-tunnels".  So what is it exactly, 
and how do we know it works?

Have you thought about cases where multiple domains (i.e., more than 2) using 
different tunnel types are interconnected, perhaps in a cycle?

I think the issue of how to interwork domains that use different tunnel types 
is quite important.  If one wants to interwork an MVPN domain that uses 
mLDP-based P2MP LSPs with an EVPN domain that uses IR, I don't think one wants 
to tell customers that interoperability requires them to start using mLDP 
inside the EVPN domain.  If one is using assisted replication (AR) within the 
EVPN domain, I don't think anyone will want to hear "sorry, AR is not supported 
by MVPN".  I don't think the interworking between two domains can be called 
"seamless" if one has to change the tunnel types  of either domain.  But the 
details for how to do the interworking between different tunnel types just 
don't seem to be present.

Furthermore, it is pretty clear that some sort of gateway is going to be needed 
to provide interoperability with RFC 7432 nodes that do not implement MVPN; 
this needs to be addressed as well.

Kesavan>>  Latest revision covers gateway based proposal.  Some updates are 
still required w.r.t MVPN-EVPN internetworking, which will be taken care in the 
next revision.

Eric> Another approach would be to set up two independent MVPN domains and 
carefully assign RTs to ensure that routes are not leaked from one domain to 
another.  One would also have to ensure that the boundary points send the 
proper set of routes into the "other" domain.  (This includes the unicast 
routes as well as the multicast routes.)  And one would have to include a whole 
bunch of applicability restrictions, such as "don't use the same RR to hold 
routes of both domains".  I think that's what's being proposed, but there isn't 
enough discussion of RT and RD management to be sure, and there isn't much 
discussion of what information the boundary points send into each domain.

Ali> I will expand on that with the RD and RT management aspects.  But the 
intention is with a single MVPN domain where both EVPN and MVPN PEs participate.

Note that the use of a single RT by both MVPN and EVPN nodes will cause routes 
to be distributed throughout the "single MVPN domain", with no opportunity for 
a gateway to modify the routes.  But section 10.1 does seem to require a 
gateway to modify routes in order to connnect tunnels of different types.

Kesavan>> yes, Gateway needs to re-originate routes to connect different tunnel 
types. Please check next version.

Eric> 7. The proposal requires that EVPN export a host route to MVPN for each 
EVPN-attached multicast source.  It's a good thing that there is no requirement 
like "do not burden existing MVPN deployments with a whole bunch of additional 
host routes".  Wait a minute, maybe there is such a requirement.

Eric> In fact, whether the host routes are necessary to achieve optimal routing 
depends on the topology.  And this is a case where an operator might well want 
to sacrifice some routing optimality to reduce the routing burden on the MVPN 
nodes.

Ali> If there is mobility, then there is host route advertisement If there is 
no mobility, then prefixes can be advertised.

It seems to me that this is simply not true.  Consider the following example:

- BD1 has subnet 192.168.168.0/24.

- BD1 exists on ES1, which is attached to PE1.

- BD2 exists on ES2, which is attached to PE2.  (ES1 and ES2 are not the same 
ES.)

- On BD1/ES1, there are hosts 192.168.168.1, 192.168.168.103, 192.168.168.204.

- On BD2/ES2 there are hosts 192.168.168.2, 192.168.168.104, 192.168.1.203.

Assume there is no mobility.

In this scenario, I don't see how either PE1 or PE2 can advertise any prefix 
shorter than a /32.  And I don't see how one will prevent all these /32 routes 
from being distributed to all the MVPN nodes.

The fundamental issue here is that while IP addresses can be aggregated on a 
per-BD basis, they cannot be aggregated on a per-ES basis.

I don't think you get "seamless" interworking by requiring all the MVPN nodes 
to receive an unbounded number of host routes.

Kesavan>> With seamless interop model, single copy of data traffic  serves both 
MVPN and EVPN PEs.  But, host routes need to be advertised to MVPN-PEs that are 
directly attached to the fabric.
In gateway model, summarized routes can be advertised to MVPN PEs.

Eric> 8. The proposal simply does not work when MVPN receivers are interested 
in multicast flows from EVPN sources that are attached to all-active 
multi-homed ethernet segments.

Ali> This issue has been addressed in the new revision.

Yes, this is an improvement.

Suppose PE1 receives an (S,G) IP multicast frame over a local AC from BD1/ES1.  
And suppose PE2,...,PEn are also attached to ES1.  Per the new revision, PE1 
transmits a copy of the frame on an EVPN-specific tunnel to PE2,...,PEn (an 
"intra-ES1" tunnel), as well as transmitting a copy of the contained IP 
datagram on whatever MVPN tunnel it uses to carry (S,G) packets.  Now any EVPN 
PE attached to the source ES can be selected as the UMH by an MVPN node, 
because all such EVPN PEs get the (S,G) frames and can forward forwards them to 
MVPN receivers.

It's good to see the draft recognizing that IP multicast frames do need to be 
transmitted as frames on EVPN-specific tunnels, in addition to being 
transmitted as packets on MVPN tunnels.  (Of course this solution violates the 
stated "requirement" that a given IP multicast packet not be transmitted on two 
different tunnels. Sigh, another example of "requirements" being applied 
inconsistently.)

However, there are still several problems with this solution.

No control plane is described to support this intra-ES tunneling.  Is that "for 
the next revision"? ;-)

There's a suggestion that this solution is trivial, because no one would ever 
home an ES to more than two PEs, and therefore you just have to unicast a copy 
to the other PE.

But the PE receiving a frame has to figure out whether the frame was sent to it 
on an intra-ES tunnel or not, and if so, which ES the tunnel is associated 
with.  It is not clear how the receving PE is supposed to make this 
determination.  One needs to say something more than "just use ingress 
replicaton".

The draft also suggests that "multi-homed" always means "dual-homed", which I 
don't think is acceptable.

Note also that a scheme like this causes EVERY (S,G) frame to get sent to EVERY 
PE that is attached to S's source ES.  This happens even if there are NO 
receivers anywhere interested in (S,G) at all.  In effect, the LAG hashing 
algorithm is defeated.  If a switch is multi-homed to n PEs, it uses a LAG 
hashing algorithm to ensure that any given packet is sent to just one of those 
PEs.  Then one EVPN PE gets the packet and sends it to the other n-1 PEs, who 
have to treat the packet as if it had just arrived on the AC from the 
multi-homed switch.  Iit would be better to have a "pull model" where PEx gets 
the (S,G) packet from PE1 only if some MVPN PE has sent a C-multicast (S,G) or 
(*,G) route to PEx.

In addition, the latest rev of the draft is still confused about the way UMH 
selection is done.  It seems to assume all the PEs will select the same 
"Upstream PE" for a given (S,G).  While this is one possible option (generally 
referred to as Single Forwarder Selection), it is not required, and I believe 
the most common deployment scenario is to use the "Installed UMH Route" as the 
"Selected UMH Route".  (See section 5.1.3 of RFC 6513.)  This means that it is 
always possible for a PE to receive more than one copy of an (S,G) packet, and 
the PE must therefore always be able to apply the "discard from the wrong PE" 
procedures of RFC 6513 Section 9.1.1.

Suppose for example that EVPN-PE1 transmits its IP multicast frames on an 
I-PMSI that is instantiated by a P2MP LSP.  EVPN-PE2 will have to join that 
I-PMSI.  If PE1 and PE2 are both attached to BD1/ES1, then when PE1 gets an 
(S,G) IP multicast frame from BD1/ES1, PE2 will get two copies: one on the 
intra-ES1 tunnel from PE1 and one on the I-PMSI tunnel from PE1.  PE2 will 
probably choose itself as the "Upstream PE" for (S,G), in which case it needs 
to discard the copy that arrives on the I-PMSI tunnel from PE1, while accepting 
the copy that arrives on the intra-ES1 tunnel from PE1.  (If PE2 for some 
reason chose PE1 as the Upstream PE for (S,G), it would have to discard the 
copy arriving on the intra-ES1 tunnel and accept the copy arriving on the 
I-PMSI tunnel.)  The draft seems to imply, incorrectly, that the "discard from 
the wrong PE" procedure is not necessary.

The "discard from the wrong PE" procedures are also needed to handle the case 
where the source is at a site homed to two or more MVPN PEs, and there are MVPN 
receivers that do not do single forwarder selection.  This may cause some 
packets to appear on multiple I-PMSIs, and each EVPN-PE will have to join all 
the I-PMSIs, of course.

(The use of S-PMSIs rather than I-PMSIs does not eliminate this problem.  A 
given S-PMSI from PE1 might carry a flow that PE2 needs from PE1, and it might 
also carry a flow that PE2 is getting on an S-PMSI from PE3.)

Please note that if MPLS ingress replication is being used, the "discard from 
the wrong PE" functionality requires that the egress PE be able to tell from a 
packet's encapsulation when a packet is from the wrong ingress PE.

If the MVPN nodes are using the "extranet" feature (RFC 7900), "discard from 
the wrong PE" is not actually sufficient, one needs to "discard from the wrong 
ingress VRF".

Since there is no clean layering between MVPN and EVPN protocols in this 
proposal, every little nit and corner case of MVPN has to be examined to make 
sure it will also work in the EVPN domain.

Another problem: according to the draft, if an EVPN PE, say PE1, learns of a 
source via a locally attached all-active multi-homed ES, it will originate an 
IP route for that source.  Consider another PE, say PE2, attached to the same 
multi-homed ES.  When PE2 receives that IP route from PE1, PE2 will then 
originate its own IP route for that source.  Since PE1 receives PE2's route, it 
is not clear how the route ever gets withdrawn.  If PE1 stops seeing the local 
traffic, it will still see PE2's route, and hence will still originate its own 
route.  One might think this is easily fixed by attaching to PE1's route an EC 
that declares that route to be "authoritative"; PE2's route would not have that 
EC.  Note though that the adding or removal of this "authoritative" EC will 
cause some churn that will be visible to the MVPN-only nodes, even though it 
does not provide them with any useful information.

Kesavan>>   Will cover intra-ES tunneling procedure in the next revision.

I would also like to take note of the following issue.  From the draft:

       "The EVPN PEs terminate ...  PIM messages from tenant routers on their 
IRB interfaces, thus avoid sending these messages over MPLS/IP core."

A PIM control message from a given PIM router needs to reach whichever other 
PIM router is a possible unicast next hop for any multicast source or RP.  The 
scheme of having each EVPN PE terminate the PIM messages presupposes that each 
tenant router will have the nearest EVPN PE as its unicast next hop towards the 
multicast source or RP.

This is likely to be a common scenario, but it certainly is not the only 
scenario.  A tenant might have several PIM routers on a given BD, where each 
PIM router is attached to a different PE.  The PIM routers could be IGP 
neighbors in the tenant's IGP, and may be exchanging IGP updates with each 
other.  In this case, PIM control messages from one tenant PIM router on the BD 
need to reach the other tenant routers on the BD.

Kesavan>>   IGPs are usually terminated at the PE in the EVPN fabric.  This is 
the typical deployment model.

For example, suppose Tenant Router R1 on BD1 attaches to PE1, and Tenant Router 
R2 on a different ES of BD1 attaches to PE2.  If R1 and R2 are IGP neighbors, 
R2 may see R1 as the next hop to a given source S.  In that case, R2 may choose 
to target a PIM Join(S,G) to R1.

In this scenario, the PIM control messages between R1 and R2 have to be sent 
between PE1 and PE2.  Since PIM control messages have a TTL of 1, they would 
have to be sent on BD1's BUM tunnels rather than on the IP multicast tunnels.

Now the question is, if R2 sends PIM Join(S,G) to R1, how does R2 get the (S,G) 
traffic from R1?  Either PE1 has to send it on BD1's BUM tunnel, or else PE2 
has to figure out that it needs to pull (S,G) traffic from PE1 on an IP 
multicast tunnel.  The spec needs to explain how this situation is handled.  If 
the (S,G) traffic travels on BD1's BUM tunnel, the spec also has to make it 
clear how that traffic gets to other BDs.

BTW, section 6.5 of the draft says that any frame containing an IP packet whose 
destination address is in the range 224/8 is sent as a BUM frame.  I suspect 
that 224.0.0/24 is what is meant, as that seems to be the IPv4 multicast 
link-local address space.

One more thing.  The draft says that SPT-ONLY (RFC 6514 section 14) mode should 
be the default configuration.  This has several problems:

- SPT-ONLY mode requires each PE to function as an RP, which creates a 
considerable amount of additional work for the PE (handling the register 
messages and maintaining a large number of (S,G) states).  It also requires the 
PE to originate a Source Active A-D route for each (S,G), a route that would 
not otherwise be needed.

- If the tenant or MVPN customer already has a multicast infrastructure with 
Rendezvous Points (RPs), it may be impossible to use SPT-ONLY mode, as this 
mode may not be compatible with the customer/tenant's infrastructure.  However, 
it may still be desirable to have RP-free operation for multicast groups whose 
sources and receivers are all in the EVPN domain.

- SPT-ONLY mode can sometimes be made compatible with an existing 
tenant/customer multicast infrastructure by having the PEs participate in the 
BSR or Auto-RP protocols, and/or by having the PEs participate in MSDP.  This 
would not generally be regarded as a simplification.

- If one is interworking with an MVPN whose PEs are configured to use RPT-SPT 
mode (RFC 6514 section 13), one must configure the EVPN-PEs to use RPT-SPT mode 
as well, because the two modes are not interoperable.  I believe most MVPN 
deployments use RPT-SPT mode.

So I don't see the grounds for recommending the SPT-ONLY mode as the default.  
The choice between SPT-ONLY mode and RPT-SPT mode depends on many factors and 
requires knowledge of (a) a particular tenant's deployment scenario, and (b) if 
MVPN interworking is being done, the mode that is being used by the MVPN nodes.

Kesavan >>  Using spt-only mode has advantages compared to rpt-spt mode in  
evpn only fabric. Hence it is recommended as default.  BTW, the solution 
supports rpt-spt mode as well, which can be used while doing interop with 
existing MVPN network that uses rpt-spt mode.

_______________________________________________
BESS mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/bess

Re: [bess] Comments on draft-sajassi-bess-evpn-mvpn-seamless-interop

Reply via email to