Hi, Comments inline.
We are planning to publish a new version of the draft with our proposed changes, on which John Scudder has graciously worked with us, once the black-out has been lifted. Yours Irrespectively, John Juniper Business Use Only > -----Original Message----- > From: John Scudder via Datatracker <[email protected]> > Sent: Thursday, July 8, 2021 3:48 PM > To: The IESG <[email protected]> > Cc: [email protected]; bess- > [email protected]; [email protected]; Jeffrey (Zhaohui) Zhang <[email protected]>; > Jeffrey (Zhaohui) Zhang <[email protected]> > Subject: John Scudder's Discuss on > draft-ietf-bess-evpn-inter-subnet-forwarding- > 14: (with DISCUSS and COMMENT) > > [External Email. Be cautious of content] > > > John Scudder has entered the following ballot position for > draft-ietf-bess-evpn-inter-subnet-forwarding-14: Discuss > > When responding, please keep the subject line intact and reply to all email > addresses included in the To and CC lines. (Feel free to cut this introductory > paragraph, however.) > > > Please refer to > https://urldefense.com/v3/__https://www.ietf.org/iesg/statement/discuss- > criteria.html__;!!NEt6yMaO-gk!Sx0rT5dHMP4Qugb8- > 6e8tN8copPtUNSeufjEDFBoUF37qoY3S_zwyQiVMuzi4PE$ > for more information about DISCUSS and COMMENT positions. > > > The document, along with other ballot positions, can be found here: > https://urldefense.com/v3/__https://datatracker.ietf.org/doc/draft-ietf-bess- > evpn-inter-subnet-forwarding/__;!!NEt6yMaO-gk!Sx0rT5dHMP4Qugb8- > 6e8tN8copPtUNSeufjEDFBoUF37qoY3S_zwyQiVDhzN0_Q$ > > > > ---------------------------------------------------------------------- > DISCUSS: > ---------------------------------------------------------------------- > > I found this document difficult to review. Some of this might be due to the > fact > that I'm not an expert on EVPN, but I think some of the reason is that the > document could be structured better and expressed more clearly. The only > reason I'm not opposing progression of the document on the grounds that it's > too unclear to implement is that I've been told, and accept on faith, that > implementations *have* been successfully written starting from the spec, which > implies it's implementable -- I guess by people who are expert in EVPN > already, it > wouldn't be implementable by me. > > In any case, I do have some points I would like to discuss, that are more > actionable. > > 1. I agree with Robert Wilton's comment on -09: > > ``` > One question I have is whether it is possible to have a deployment where some > devices support synchronous mode and others support asynchronous mode. Am > I right in presuming that this is not supported and if so is this capability > signaled > in any way? Or is the expectation that this would be controlled via deployment > choice of network device, or though configuration management? ``` > > This issue still exists in -14. I think it should be addressed in the > document. > Similarly, I agree with Warren Kumari's comment, also on -09: > > ``` > I would strongly recommend that the authors read the OpsDir review at: > https://urldefense.com/v3/__https://datatracker.ietf.org/doc/review-ietf-bess- > evpn-inter-subnet-forwarding-09-opsdir-lc-jaeggli-2020-07-06/__;!!NEt6yMaO- > gk!Sx0rT5dHMP4Qugb8-6e8tN8copPtUNSeufjEDFBoUF37qoY3S_zwyQiVP1- > j8m4$ > , especially the: "it would be helpful if section 4 would be more explicit > for non- > implementors on when symetric or asymetric modules would be chosen, as it > stands the variation basically reads like the enumeration of the features of > various implementations." comment (which I fully agree with). ``` > > It seems both of these comments could -- and should! -- be addressed by adding > a few paragraphs talking about these topics. This could be done either in §4, > as > Warren suggests, or in some other section (e.g. you could add an "operational > considerations" section). [JD] We will add a new section 4.2 to the draft as follows: 4.2 Operational Considerations Symmetric and Asymmetric IRB modes may coexist in the same network, and an ingress PE that supports both forwarding modes for a given tenant can interwork with egress PEs that support either IRB mode. The egress PE will indicate the desired forwarding mode for a given host based on the presence of the Label2 field and the IP-VRF route-target in the EVPN MAC/IP Advertisement route. If the Label2 field of the received MAC/IP Advertisement route for host H1 is non-zero, and one of its route-targets identifies the IP-VRF, the ingress PE will use Symmetric IRB mode when forwarding packets destined to H1. If the Label2 field is zero and the MAC/IP Advertisement route for H1 does not carry any route-target that identifies the IP-VRF, the ingress PE will use Asymmetric mode when forwarding traffic to H1. As an example that illustrates the previous statement, suppose PE1 and PE2 need to forward packets from TS2 to TS4 in the example of Figure 4. Since both PEs are attached to the bridge table of the destination host, Symmetric and Asymmetric IRB modes are both possible as long as the ingress PE, PE1, supports both modes. The forwarding mode will depend on the mode configured in the egress PE, PE2. That is: • If PE2 is configured for Symmetric IRB mode, PE2 will advertise TS4 MAC/IP addresses in a MAC/IP Advertisement route with a non-zero Label2 field, e.g., Label2=Lx, and a route-target that identifies IP-VRF1 in PE1. IP4 will be installed in PE1’s IP-VRF1, TS4’s ARP and MAC information will also be installed in PE1’s IRB interface ARP table and BT1 respectively. When a packet from TS2 destined to TS4 is looked up in PE1’s IP-VRF route-table, a longest prefix match lookup will find IP4 in the IP-VRF, and PE1 will forward using the Symmetric IRB mode and Label Lx. • However, if PE2 is configured for Asymmetric IRB mode, PE2 will advertise TS4 MAC/IP information in a MAC/IP Advertisement route with a zero Label2 field and no route-target identifying IP-VRF1. In this case, PE2 will install TS4 information in its ARP table and BT1. When a packet from TS2 to TS4 arrives at PE1, a longest prefix match on IP-VRF1’s route-table will yield the local IRB interface to BT1, where a subsequent ARP and BT lookup will provide the information for an Asymmetric forwarding mode to PE2. Refer to [I-D.ietf-bess-evpn-modes-interop] for more information about interoperability between Symmetric and Asymmetric forwarding modes. The choice between Symmetric or Asymmetric mode is based on the operator’s preference and it is a trade-off between scale (better in the Symmetric IRB mode) and control plane simplicity (Asymmetric IRB mode simplifies the control plane). In cases where a tenant has hosts for every subnet attached to all (or most) the PEs, the ARP and MAC entries need to be learned by all PEs anyway and therefore the Asymmetric IRB mode simplifies the forwarding model and saves space in the IP-VRF route-table, since host routes are not installed in the route-table. However, if the tenant does not need to stretch subnets (BDs) to multiple PEs and inter-subnet-forwarding is needed, the Symmetric IRB model will save ARP and BT space in all the PEs (in comparison with the Asymmetric IRB model). > > 2. Section 7.1 > > I’m guessing this question isn’t unique to this document, but since this is > where I > encountered it, I’ll ask: it seems as though the described mobility > procedures are > vulnerable to a condition where a particular (IP, MAC) appears at two > different > NVEs at the same time. If this condition exists (either innocently, or > maliciously) > what prevents the source and target NVEs from continually attempting to claim > the (IP, MAC) from one another, flooding the network with updates all the > while? > > (This applies to 7.2 as well.) > > Since this seems like a potential security issue, I'm including it in my > DISCUSS. [JD] The intention was that the procedures of section 15.1 of RFC 7432 (https://datatracker.ietf.org/doc/html/rfc7432#section-15.1) would be followed. We will add the following sentence to the first paragraph of section 7: The procedures of section 15.1 of RFC 7432 (https://datatracker.ietf.org/doc/html/rfc7432#section-15.1) MUST be followed. > > > ---------------------------------------------------------------------- > COMMENT: > ---------------------------------------------------------------------- > > Below are a number of questions and comments that I hope might help improve > the document. I haven't chosen to make them blocking by including them in my > DISCUSS; nonetheless I would appreciate replies to them. > > 1. I agree with the comments by several of the other reviewers, that there are > just too many gratuitous acronyms in this document. They aren't the only thing > that makes it hard to read, but they certainly contribute. I'm disappointed > to see > this hasn't been addressed between versions -09 and -14. It would have been a > small matter of search-and-replace to go through and expand most of the > acronyms. [JD] ] We will use broadcast domain, bridge table, extended community, MAC/IP Advertisement route, and IP Prefix route throughout the draft. > > 2. Section 2 > > ``` > R1: The solution must allow for both inter-subnet and intra-subnet > traffic belonging to the same tenant to be locally routed and bridged > respectively. The solution must provide IP routing for inter-subnet > traffic and Ethernet Bridging for intra-subnet traffic. It should be > noted that if an IP-VRF in a NVE is configured for IPv6 and that NVE > receives IPv4 traffic on the corresponding VLAN, then the IPv4 > traffic is treated as L2 traffic and it is bridged. Also vise versa, > if an IP-VRF in a NVE is configured for IPv4 and that NVE receives > IPv6 traffic on the corresponding VLAN, then the IPv6 traffic is > treated as L2 traffic and it is bridged. > > R2: The solution must support bridging for non-IP traffic. > ``` > > R1 is a little tortured, where you add all the caveats about “treated as L2 > traffic”. Seems to me like it would fall out more naturally if you had simply > introduced the concepts of routable and non-routable traffic, where routable > traffic is that for which a suitable IP-VRF exists. That would also have the > pleasant effect of making R2 say “… must support bridging for non-routable > traffic” instead of “non-IP traffic”, which is technically incorrect (since > per > R1 you might have non-routable IP traffic). [JD] We will remove R2 and replace R1 with the following: R1: The solution must provide each tenant with IP routing of its inter-subnet traffic and Ethernet bridging of its intra-subnet traffic and non-routable traffic, where non-routable traffic refers both to non-IP traffic and IP traffic whose version differs from the IP version configured in the IP-VRF. For example, if an IP-VRF in a NVE is configured for IPv6 and that NVE receives IPv4 traffic on the corresponding VLAN, then the IPv4 traffic is treated as non-routable traffic. > > ``` > R3: The solution must allow inter-subnet switching to be disabled on > a per VLAN basis on PEs where the traffic needs to be backhauled to > another node (i.e., for performing FW or DPI functionality). > ``` > > What’s “switching”? The document is about routing vs. bridging, which do you > mean? I think you mean “routing”. IMO you should get rid of the word > “switching” and replace with something less ambiguous, e.g. “routing”. (Both > here and the one other place in the doc where you use “switching”.) > > Also, I think you don’t mean “i.e.”, I think you mean “e.g.”. The meaning of > “i.e.” is “in other words”. The meaning of “e.g.” is “for example”. The best > way > to avoid these problems, IMO, is to simply write out what you mean, so in this > case write “(for example, for performing FW or DPI functionality).” (And oh by > the way, you haven’t defined or expanded FW or DPI, please do so.) [JD] We will change both occurrences of 'switching' to 'routing'. We will also replace the new R2, after renumbering, with the following: R2: The solution must allow IP routing of inter-subnet traffic to be disabled on a per-VLAN basis on those PEs that are backhauling that traffic to another PE for routing. > > 3. Section 4 > > ``` > o references to ARP table in the context of asymmetric IRB is a > logical view of a forwarding table that maintains an IP to MAC > binding entry on a layer 3 interface for both IPv4 and IPv6. > These entries are not subject to ARP or ND protocol. > ``` > > This passage shines a spotlight on the fact that “ARP table” as it’s used in > this > document is a misnomer, since it’s a table that is not (necessarily) > populated by > ARP. I don’t propose that you change the nomenclature, since it’s firmly > established even though wrong — but it might be worth adding the first > sentence or one like it to your Terminology section. [JD] > [JD] We will add the following definition to section 1: ARP table: A logical view of a forwarding table on a PE that maintains an IP to MAC binding entry on an IP interface for both IPv4 and IPv6. These entries are learned through ARP/ND or through EVPN. And we will change the referenced paragraph to the following: In the context of asymmetric IRB, an implementation may choose to place ARP table entries learned through EVPN directly into the appropriate forwarding table rather than placing them in ARP/ND tables. > > 4. Section 4 > > Figure 2 depicts BT2 being present on the ingress PE, but the text makes it > clear > that in the symmetric mode that this figure depicts, BT2 doesn’t actually > need to > be there. Wouldn’t it be clearer if you didn’t show it? [JD] We want the same bridge tables in both figure 2 and figure 3. This makes the differences between asymmetric IRB and symmetric IRB clearer. > > 5. Section 4 > > I have a hard time parsing this text: > > ``` > Each BT on a PE is > associated with a unique VLAN (e.g., with a BD) ``` > > So, 1 VLAN —> at least 1 BT (1:many) > > ``` > where in turn it is > associated with a single MAC-VRF > ``` > > So, 1 MAC-VRF —> at least 1 BT (1:many) > > ``` > in the case of VLAN-Based mode or a > number of BTs can be associated with a single MAC-VRF in the case of > VLAN-Aware Bundle mode. > ``` > > So, 1 MAC-VRF —> at least 1 BT (1:many) > > Since this is stated as an exception I guess that means you meant the > preceding > two (that I parsed as 1:many) are actually supposed to be 1:1? If so I think > this > needs a rewrite (it probably does regardless, for clarity). [JD] We will replace the paragraph preceding figure 4 in section 4 with the following: The following sections define the control and data plane procedures for symmetric and asymmetric IRB on ingress and egress PEs. The following figure is used to describe these procedures, showing a single IP-VRF and a number of broadcast domains on each PE for a given tenant. I.e., an IP-VRF connects one or more EVIs, each EVI contains one MAC-VRF, each MAC VRF consists of one or more bridge tables, one per broadcast domain, and a PE has an associated IRB interface for each broadcast domain. > > 6. Section 4.1 > > When you write “Internet standard bit order“, do you mean “network byte > order“? > Although even network byte order appears to be non-applicable, since the > values are shown with an explicit byte order. > > I realize the definitions are merely pasted from RFC 5798 and that ship has > sailed, but unless you can explain what “(in hex, in Internet standard > bit-order)” > is supposed to mean, I suggest removing it. (Alternately and less desirably, > make > it explicit that you’re providing a direct quotation of RFC > 5798.) [JD] We will remove both instances of "(in hex, in Internet standard bit-order)". > > 7. Section 5.1 > > You say the Encapsulation Extended Community and Router’s MAC Extended > Community have to be sent, but you say nothing about the required values. For > Router's MAC, §8.1 specifies the required value, I suggest a forward reference > to it. For Encapsulation, the closest I was able to find to a place where > this is > specified was section 9.1.1, but that's only an example. There really needs > to be > some place where it's spelled out. A bare minimum would be to cite RFC > 9012 §4.1, but that just provides the syntax -- you really should say > something > more about how to decide what value to send. For that matter, it could be what > valueS to send -- is it legal for a NVE to advertise multiple Encapsulation > Extended Communities? You don't say it isn't, and there are potential reasons > to > do so. [JD] We will replace the penultimate paragraph of section 5.1 with the following: This route is advertised with one or more Encapsulation extended communities [RFC9012], one for each encapsulation type supported by the advertising PE. If one or more encapsulation types require an Ethernet frame, a single Router's MAC extended community, section 8.1, is also advertised. This extended community specifies the MAC address to be used as the inner destination MAC address in an Ethernet frame sent to the advertising PE. > > 8. Section 5.2 > > ``` > o Using MAC-VRF Route Target (and Ethernet Tag if different from > zero), it identifies the corresponding MAC-VRF (and BT). If the > MAC- VRF (and BT) exists (e.g., it is locally configured) then it ``` > > You use “e.g.” so I presume there might be other reasons the MAC-VRF and BT > might exist even if not locally configured? > > ``` > imports the MAC address into it. Otherwise, it does not import > the MAC address. > > o Using IP-VRF route target, it identifies the corresponding IP-VRF > and imports the IP address into it. > ``` > > You don’t provide any conditional language in this bullet about “if the IP-VRF > exists”. Why is that caveat required for MAC-VRF but not for IP-VRF? [JD] We will replace the two bullet items in the first paragraph of section 5.2 with the following: The MAC-VRF route target and Ethernet Tag, if the latter is non-zero, are used to identify the correct MAC-VRF and bridge table and if they are found the MAC address is imported. The IP-VRF route target is used to identify the correct IP-VRF and if it is found the IP address is imported. > > 9. Section 5.2 > > ``` > The inclusion of MPLS label2 field in this route signals to the > receiving PE that this route is for symmetric IRB mode and MPLS > label2 needs to be installed in forwarding path to identify the > corresponding IP-VRF. > ``` > > I was unable to make head nor tail of this paragraph. I suppose §5.4 is where > the > behavior is actually specified, so in a way it doesn’t matter (although maybe > a > forward reference would help). [JD] We will replace the referenced paragraph with the following: If the MPLS label2 field is non-zero, it means that this route is to be used for symmetric IRB and the MPLS label2 value is to be used when sending a packet for this IP address to the advertising PE. > > 10. Section 5.2 > > ``` > If the receiving PE receives this route with both the MAC-VRF and IP- > VRF route targets and if the receiving PE does not support either > asymmetric or symmetric IRB modes, then if it has the corresponding > MAC-VRF, it only imports the MAC address. Otherwise, if it doesn't > have the corresponding MAC-VRF, it must not import this route. > ``` > > If it doesn’t support either asymmetric or symmetric IRB modes, then doesn’t > that mean it doesn’t implement this specification at all? In that > circumstance, > how do you expect your “must not” to be respected? [JD] We will strike the last sentence of the referenced paragraph. > > 11. Section 5.3 > > ``` > If host B's (MAC, IP) has not yet been > learnt either via a gratuitous ARP OR via a prior gleaning procedure, > a new gleaning procedure MUST be triggered ``` > > Since you’ve used MUST here, you MUST provide a reference to where the “new > gleaning procedure” is specified. > > Also, has not been learnt by whom? The procedure must be triggered where? > > 12. Section 5.3 > > The second paragraph, that begins "Consider a subnet A", is tremendously > confusing to a first-time reader (or at least to this first-time reader). I > realize you > probably think you're being helpful by providing a worked example, but as I > read > through it, it was the opposite of helpful. This is especially true because > §5 and > its subsections is about "Symmetric IRB Procedures" -- and the paragraph in > question provides no procedures. > > Some options to improve the situation -- > > - Remove the paragraph entirely. > - Preface the paragraph with "as an example to show why advertisement as RT-5 > is required," [JD] For both comment 11 and comment 12, we will replace the second paragraph in section 5.3 with the following: I.e., if a given host's (MAC, IP) association is unknown, and an ingress PE needs to send a packet to that host, then that ingress PE needs to know which egress PEs are attached to the subnet in which the host resides in order to send the packet to one of those PEs, causing the PE receiving the packet to probe for that host. > > 13. Section 5.4 > > ``` > o global mode: VNI is set to the received label2 in the route which > is domain-wide assigned. This VNI value from received label2 MUST > be the same as the locally configured VNI for the IP VRF as all > PEs in the NVO MUST be configured with the same IP VRF VNI for > this mode of operation. > ``` > > What action is to be taken if this MUST is violated? [JD] We will add a subsequent paragraph: If the received label2 value does not match the locally configured VNI value the route MUST NOT be used and an error message SHOULD logged. > > 14. Section 6.1 > > ``` > For asymmetric IRB mode, Router's MAC EC is not needed because ``` > > Please either expand “EC” or add it to your definitions section. (Also > applies to > 5.1) [JD] For asymmetric IRB mode, a Router's MAC extended community is not needed because ... In section 5.1: For symmetric IRB mode, a Router's MAC extended community is needed to carry the PE's ... > > 15. Section 6.2 > > ``` > o If only MAC-VRF route target is used, then the receiving PE uses > the MAC-VRF route target to identify the corresponding IP-VRF -- > i.e., many MAC-VRF route targets map to the same IP-VRF for a > given tenant. In this case, MAC-VRF may be used by the receiving > PE to identify the corresponding IP VRF ``` > > Do you mean “in this case, the MAC-VRF *route target* may be used…”? [JD] We will replace the last sentence in the referenced paragraph with the following: In this case, the MAC-VRF route target may be used by the receiving PE to identify the corresponding IP VRF. > > 16. Section 6.2 > > ``` > If the receiving PE receives the MAC/IP Advertisement route with MPLS > label2 field and it uses symmetric IRB mode ``` > > This entire section is entitled “asymmetric IRB procedures“. Why is there > specification language regarding symmetric procedures in it? (I’m pretty sure > this is not the only place this kind of problem appears.) [JD] We will strike the referenced paragraph. > > 17. Section 7.3 > > ``` > On the source NVE, an age-out timer (for the silent host that has > moved) is used to trigger an ARP probe. This age-out timer can be > either ARP timer or MAC age-out timer and this is an implementation > choice. The ARP request gets sent both locally to all the attached > TSes on that subnet as well as it gets sent to all the remote NVEs > (including the target NVE) participating in that subnet. The source > NVE also withdraw the EVPN MAC/IP Advertisement route with only the > MAC address (if it has previously advertised such a route). > ``` > > Wouldn’t the source NVE only withdraw the route after a timeout had expired? > As you have written this paragraph, in case the silent TS has not moved, the > following would happen: > > ``` > Time t: age-out timer fires, ARP probe is sent Time t: NVE withdraws route > advertisement Time u > t: TS receives ARP probe, sends ARP reply Time v > u: > NVE receives ARP reply Time v: NVE re-advertises route ``` > > Presumably this churn isn’t what you intended. [JD] The assumption is that the host has moved so it's probably best to withdraw the route. I.e., we're optimizing for the normal case. We can always write another draft optimized for reclusive hosts. > > 18. Section 9.2 > > How does the NVE learn what subnets are behind its attached TS? [JD} Through configuration. We will add the following text after figure 7 in section 9.2: Note that in figure 7, above, SN1 and SN2 are configured on NVE1, which then advertises each in an IP Prefix route. Similarly, SN3 is configured on NVE2, which then advertises it in an IP Prefix route. > > 19. Section 9.2 > > What about if TS4 wants to reach SN1? How does it know where to send the > packet? (I suppose the answer may be the same as for #18.) [JD] We will add the following text after the text added in the reply to 18., above: If TS4, for example, wants to reach SN1, it uses its default route and sends the packet to the MAC address associated with the IRB interface on NVE2, NVE2 then makes an IP lookup in its IP- VRF, and finds an entry for SN1. > > _______________________________________________ BESS mailing list [email protected] https://www.ietf.org/mailman/listinfo/bess
