Hi Daniel, thank you for the review, comments, and helpful suggestions. I'll work on answering questions and addressing comments and respond soon.
Regards, Greg On Fri, Oct 23, 2020 at 5:36 AM Daniel Migault via Datatracker < nore...@ietf.org> wrote: > Reviewer: Daniel Migault > Review result: Has Nits > > Hi, > > > I reviewed this document as part of the Security Directorate's ongoing > effort to > review all IETF documents being processed by the IESG. These comments were > written primarily for the benefit of the Security Area Directors. Document > authors, document editors, and WG chairs should treat these comments just > like > any other IETF Last Call comments. Please note also that my expertise in > BGP is > limited, so feel free to take these comments with a pitch of salt. > > Review Results: Has Nits > > Please find my comments below. > > Yours, > Daniel > > > Multicast VPN Fast Upstream Failover > draft-ietf-bess-mvpn-fast-failover-11 > > Abstract > > This document defines multicast VPN extensions and procedures that > allow fast failover for upstream failures, by allowing downstream PEs > to take into account the status of Provider-Tunnels (P-tunnels) when > selecting the Upstream PE for a VPN multicast flow, and extending BGP > MVPN routing so that a C-multicast route can be advertised toward a > Standby Upstream PE. > > <mglt> > Though it might be just a nit, if MVPN > designates multicast VPN, it might be > clarifying to specify the acronym in the > first sentence. This would later make > the correlation with BGP MVPN clearer. > > </mglt> > > > 1. Introduction > > In the context of multicast in BGP/MPLS VPNs, it is desirable to > provide mechanisms allowing fast recovery of connectivity on > different types of failures. This document addresses failures of > elements in the provider network that are upstream of PEs connected > to VPN sites with receivers. > > <mglt> > Well I am not familiar with neither BGP > nor MPLS. It seems that BGP/MLPS IP VPNS > and MPLS/BGP IP VPNs are both used. I am > wondering if there is a distinction > between the two and a preferred way to > designate these VPNs. My understanding > is that the VPN-IPv4 characterizes the > VPN while MPLS is used by the backbone > for the transport. Since the PE are > connected to the backbone the VPN-IPv4 > needs to be labeled. > > </mglt> > > Section 3 describes local procedures allowing an egress PE (a PE > connected to a receiver site) to take into account the status of > P-tunnels to determine the Upstream Multicast Hop (UMH) for a given > (C-S, C-G). This method does not provide a "fast failover" solution > <mglt> > I understand the limitation is due to > BGP convergence. > > </mglt> > when used alone, but can be used together with the mechanism > described in Section 4 for a "fast failover" solution. > > Section 4 describes protocol extensions that can speed up failover by > not requiring any multicast VPN routing message exchange at recovery > time. > > Moreover, section 5 describes a "hot leaf standby" mechanism, that > uses a combination of these two mechanisms. This approach has > similarities with the solution described in [RFC7431] to improve > failover times when PIM routing is used in a network given some > topology and metric constraints. > > > [...] > > 3.1.1. mVPN Tunnel Root Tracking > > A condition to consider that the status of a P-tunnel is up is that > the root of the tunnel, as determined in the x-PMSI Tunnel attribute, > is reachable through unicast routing tables. In this case, the > downstream PE can immediately update its UMH when the reachability > condition changes. > > That is similar to BGP next-hop tracking for VPN routes, except that > the address considered is not the BGP next-hop address, but the root > address in the x-PMSI Tunnel attribute. > > If BGP next-hop tracking is done for VPN routes and the root address > of a given tunnel happens to be the same as the next-hop address in > the BGP A-D Route advertising the tunnel, then checking, in unicast > routing tables, whether the tunnel root is reachable, will be > unnecessary duplication and thus will not bring any specific benefit. > > <mglt> > It seems to me that x-PMSI address > designates a different interface than > the one used by the Tunnel itself. If > that is correct, such mechanisms seems > to assume that one equipment up on one > interface will be up on the other > interfaces. I have the impression that a > configuration change in a PE may end up > in the P-tunnel being down, while the PE > still being reachable though the x-PMSI > Tunnel attribute. If that is a possible > scenario, the current mechanisms may not > provide more efficient mechanism than > then those of the standard BGP. > > Similarly, it is assumed the tunnel is > either up or down and the determination > of not being up if being down. I am not > convinced that the two only states. > Typically services under DDoS may be > down for a small amount of time. While > this affects the network, there is not > always a clear cut between the PE being > up or down. > </mglt> > > > [...] > > 3.1.6. BFD Discriminator Attribute > > P-tunnel status may be derived from the status of a multipoint BFD > session [RFC8562] whose discriminator is advertised along with an > x-PMSI A-D Route. > > This document defines the format and ways of using a new BGP > attribute called the "BFD Discriminator". It is an optional > transitive BGP attribute. In Section 7.2, IANA is requested to > allocate the codepoint value (TBA2). The format of this attribute is > shown in Figure 1. > > <mglt> > I feel that the sentence "In Section ... > TBA2)." should be removed. > > </mglt> > > > 0 1 2 3 > 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > | BFD Mode | Reserved | > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > | BFD Discriminator | > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > ~ Optional TLVs ~ > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > > > Figure 1: Format of the BFD Discriminator Attribute > > Where: > > BFD Mode field is the one octet long. This specification defines > the P2MP BFD Session as value 1 Section 7.2. > > Reserved field is three octets long, and the value MUST be zeroed > on transmission and ignored on receipt. > > BFD Discriminator field is four octets long. > > > > > > Morin, et al. Expires April 5, 2021 [Page 7] > > Internet-Draft mVPN Fast Upstream Failover October 2020 > > > Optional TLVs is the optional variable-length field that MAY be > used in the BFD Discriminator attribute for future extensions. > TLVs MAY be included in a sequential or nested manner. To allow > for TLV nesting, it is advised to define a new TLV as a variable- > length object. Figure 2 presents the Optional TLV format TLV that > consists of: > > * one octet-long field of TLV 's Type value (Section 7.3) > > * one octet-long field of the length of the Value field in octets > > * variable length Value field. > > The length of a TLV MUST be multiple of four octets. > <mglt> > I am wondering why the constraint on the > length is not mentioned in the paragraph > associated to the field - as opposed to > a separate paragraph. > > </mglt> > > [..] > > 8. Security Considerations > > This document describes procedures based on [RFC6513] and [RFC6514] > and hence shares the security considerations respectively represented > in these specifications. > > This document uses p2mp BFD, as defined in [RFC8562], which, in turn, > is based on [RFC5880]. Security considerations relevant to each > protocol are discussed in the respective protocol specifications. An > implementation that supports this specification MUST use a mechanism > to control the maximum number of p2mp BFD sessions that can be active > at the same time. > > <mglt> > At a high level view - or at least my > interpretation of it - the document > proposes a mechanism based on BFD to > detect fault in the path. Upon a fault > detection a fail-over operation is > instructed using BGP. This rocedure is > expected to perform a faster fail-over > than traditional BGP convergence on > maintaining routing tables. Once the > fail over has been performed, BFD is > confirms the new path is "legitimate" > and works. > > It seems correct to me that the current > protocol relies on BGP / BFD security. > That said, having BFD authentication > based on MD5 or SHA1 may suggest that > stronger primitives be recommended. > While this does not concerns the current > document, it seems to me that the > information might be relayed to routing > ADs. > > What remains unclear to me - and I > assume this might be due to my lake or > expertise in routing area - is the impact > associated to performing a fail-over > both on 1) the data plane and 2) the > standard BGP way to establish routing > tables. > > Regarding the data plane, I am wondering > if fail-over results in a lost of > packets for example - I suppose for > example that at least the packets in the > process of being forwarded might be > lost. I believe that providing details > on this may be good. > > If there are any impacts I would like to > understand also in which cases the > decision to perform a failover operation > may result in more harm than the event > that has been over-interpreted. An > hypothetical scenario could be that the > non reception of a BFD packet is > interpreted as a PE being down while it > may not be correct and the PE might have > been simply under stress. A "too fast" fail-over > may over interpreted it and perform a > fail-over. If such things could happen, > an attacker could leverage a micro event > to perform network operation that are > not negligible. Another way to see that > is that an attacker might not have > direct access to the control plan, but > could use the data plan to generate a > stress and sort of control the fail > over. It seems to me that some text > might be welcome to prevent such cases > to happen. This could be guidance for > declaring a tunnel down for example. > > Similarly, it would be good to add some > text regarding the interferences with > the non-fast forwarding fail over when > performed by the standard BGP. > Typically, my impression is that the > fast fail-over mechanism is a local > decision versus the BGP convergence that > is more global. As a result, even with > more time this two mechanisms may come > with different outcomes. One such > example to illustrate my purpose could > be the following. Note that this is only > illustrative of my purpose, and I let > you find and pick on ethat is more > appropriated. I am thinking of a case > where a standby PE is be shared among > multiple PEs - supposing this situation > could occur. Typically, if PE_1, PE_2 > are shared by PE_a, ..., PE_z. In case > PE_a and PE_b are down, we expect PE_a > to switch to PE_1 and PE_b to switch to > PE_2. It seems to me that BGP would end > up in such situation while a local > decision may end up in PE_a and PE_a to > switch to PE_1. > > </mglt> > > > >
_______________________________________________ BESS mailing list BESS@ietf.org https://www.ietf.org/mailman/listinfo/bess