Hi Shane, Thanks very much for the detailed review of this document, and apologies it has taken a short while for me to reply.
I've responded to your comments in-line as [rjs]. On 25 Jun 2012, at 22:15, Shane Amante wrote: > I've read this draft and support it being published as an Informational RFC. > > I do have some comments, which are not blocking, but that I do hope improve > the clarity & readability of the draft. > > 1) In Section 1.1, the following paragraph hints at the problems with > current network architectures, but doesn't really come out and say what they > are more directly. > ---snip--- > Traditional network architectures would deploy an Interior Gateway > Protocol (IGP) to carry infrastructure and customer prefixes, with an > Exterior Gateway Protocol (EGP) such as BGP being utilised to > propagate these prefixes to other Autonomous Systems. [...] > ---snip--- > IMO, it would be better if the following section more directly said that > scale, in terms of the sheer amount of routing information in BGP, has > increased substantially, particularly in places like the core of a Route > Reflection hierarchy with order _several_ million paths, and rising. I wish > I could point you at a link to a talk Danny & I gave to GROW and/or IDR at > IETF 71 (?), but unfortunately I can't find a copy of the slides on the > IETF's Web site. You can see that it's in the GROW WG Agenda at IETF 71, but > the slides link to a different presentation: > <http://tools.ietf.org/wg/grow/agenda?item=agenda71.html> > > Anyway, IMO, this is (from my vantage point) the single factor that makes > outages not only much more visible, but /also/ (just as importantly) take > longer to recover from ... (at the end of the day, CPU is finite, there are > no "easy" answers wrt using multi-core processors to speed-up convergence of > single AFI's, DRAM speeds are not keeping pace, etc.) This may be useful to > mention here to "set the stage", as it were, since throughout the draft > there's a continuous (and, good) dialogue of not making convergence times > worse or CPU load increase as a result of incorporating better recovery > capabilities. IOW, another benefit you don't really mention anywhere, but > probably should, is that /theoretically/ some of these mechanisms may help > further promote scalability of BGP. Obviously, we won't know that until we > are evaluating solutions and researchers have time to study them, etc. [rjs]: I tried to add something to cover this that fits in with Section 1.1: <t> The combination of the increased number of deployments of BGP-4 as an intra-AS routing protocol, its use for the propagation of additional types of routing and service information, and the growth of IP services has resulted in a substantial increase in the volume of information carried within BGP-4. In numerous networks, RIB sizes of the order of millions of entries exist, with particular high-scale points existing at BGP speakers performing aggregation or functionality designed improve utilisation of network resources (e.g., route reflector hierarchies). Whilst clearly an increase in the amount routing information carried in BGP results in greater impact to services during failures, it is also critical to their recovery time. The increased time to compute new paths following a failures and subsequently re-learn them following recoveries results in greater impact of failures within the protocol, and hence adds further weight to the requirement to a void failures affecting all routing, or service, information carried via a particular adjacency. Whilst an argument could be made the convergence time of BGP-4 can be reduced through additional computational resource being deployed, it is notable that significant challenges continue to exist for operators of scaling BGP-4, and hence mechanisms which improve the scalability of the protocol are of particular note. </t> [rjs]: Do you think this captures the point that was missing? I tried to avoid making any point that asserts that the requirements defined are going to make BGP scale better, but rather just call out that operationally, throwing resources at the problem isn't something that's always possible, so anything that reduces the amount of "work" BGP needs to do is interesting. > > Related to the above, there is also the following paragraph: > ---snip--- > Along with this change in role, the nature of the IP routing > information that is carried has changed. BGP has become a ubiquitous > means by which service information can be propagated between devices. > ---snip--- > ... it may be better to say that the original scope of BGP wrt *just* > exchanging reachability information has expanded to include functions that > *were* typically in the domain of 'traditional' provisioning activities, and > leave it at that. (You are obviously familiar with things like BGP A-D + > VPLS-LDP vs. VPLS-BGP vs. RFC 4364 ... where the former divorces service > discovery [provisioning] from signaling/network reachability, whereas the > latter two tightly couple service discovery & network reachability). [rjs]: I'm not quite clear on whether this gets the point across completely - do we think that it is just that things have become in the realm of provisioning activities, or rather is it that there are more and more functions that are overloading onto BGP. I agree that this sentence doesn't necessarily capture that - but do you think that it's the generic information transfer protocol between PEs, as well as replacing provisioning mechanisms? > > 2) In Section 3, wrt the discussion on draft-ietf-idr-optional-transitive, > it may be much more compelling to state that practically this solution has > (currently) only been defined to be applicable to 3 out of the 29 total BGP > path attributes that have been assigned by IANA, at this time. And, even if > you *could* to go farther, then only 10 out of the total 29 BGP path > attributes are optional transitive. So, the short story is: a lot more work > in the area of session preservation > of iBGP sessions remains to be done, in particular wrt the other types of > mandatory, discretionary, etc. attributes ... (assuming that such a thing is > even technically possible with BGPv4, as we know it). [rjs]: Agreed, it is definitely worth highlighting this. I amended the paragraph mentioned to cover this: <t> At the time of writing, error handling mechanisms related to optional, transitive attributes - such as <xref target='I-D.ietf-idr-optional-transitive' /> are restricted to handling only a subset of attribute errors - whereas the operational requirement is to expand this coverage to the widest set of errors possible (i.e., all semantic errors within UPDATE messages). Additionally, where approaches applicable to a greater number of attributes are proposed (e.g., <xref target='I-D.chen-ebgp-error-handing'>), these are limited to deployment in eBGP applications only, where requirements also exist in intra-domain cases. As such, it is envisaged that if extended to cover these expanded cases, these mechanisms provide a means to avoid the transmission of a NOTIFICATION message to a remote BGP speaker, based on a single erroneous message, where at all possible, and hence meet this requirement. Critical errors, including those whereby the NLRI cannot be extract ed from the UPDATE message, represent cases whereby the receiving system cannot handle the error gracefully based on this mechanism. </t> > > [ snipped ] > > 4) In Section 4, I can't really discern a strong (?) preference for one of > the associated recommendations to recovering RIB consistency. I *think* what > you should say, more clearly, is there is a strong preference toward narrow, > targeted refresh mechanisms covering specific routes over those broader > solutions (recovering an entire Adj-RIB-In), to reduce the > impact on CPU, DRAM, etc. Perhaps you could add a sentence at the end of the > 4th paragraph, where you discuss draft-zeng-one-time-prefix-orf, along the > lines of what I just mentioned above? The good news is you state this very > clearly, but much later, in Section 7. Perhaps you can pull this sentence > from Section 7 up to Section 4, as well? > ---snip--- > [...] It is recommended that where available, any automatic (or > manual) triggered recovery mechanism behaviour utilises such targeted > means in preference to any whole RIB refresh mechanism (such as > ROUTE-REFRESH). > ---snip--- [rjs]: Agreed, yes. I think that there is a strong preference for the targeted mechanism in terms of scale. I rephrased the Section 3 text to be stronger, and left in the Section 7 text: <t> Whilst re-advertisement of the whole BGP RIB provides a means by which withdrawn NLRI can be re-advertised, there are some scaling implications that must be considered. In the case that a ROUTE-REFRESH is generated, all NLRI must be re-packed into UPDATE messages and advertised by one speaker on the BGP session, whilst the other must receive all UPDATE messages, and validate the RIB's consistency. In order to avoid the control-plane load, it is therefore a requirement to utilise targeted mechanisms where possible, rather than incurring the additional load on both the advertising and receiving speaker of building and processing UPDATEs for the entire contents of the RIB. </t> > 5) Section 5 seems targeted solely at recreating the data structures at one > side of a BGP session, (specifically a receiver of a BGP session). However, > who's to say the corruption is not actually in the Adj-RIB-Out of the BGP > transmitter? Thus, transmitting you an Adj-RIB-Out all over again, even > "gracefully", is unlikely to do much good. Perhaps it would be good to > consider whether the "Receiving Speaker", in Graceful Restart terminology, > should consider 'gracefully' purging and recreating his Adj-RIB-Out data > structures, as well, before re-xmit'ing them to the "Restarting Speaker"? [rjs]: Absolutely, this definitely wasn't handled in the draft -- thanks for pointing it out. I added the following text - do you think this is sufficient? <t> In some cases, the erroneous condition may be due to corruption of the Adj-RIB-Out on the advertising BGP speaker - rather than caused by the receiving speaker's state. In these cases, where existing structures are replayed whilst performing graceful restart functionality, the error condition is not necessarily resolved. Therefore, it is recommended that during a session restart event, as described within this section, the advertising speaker purge and rebuild RIB structures, in order to resolve any corruption within these structures. </t> > > > > > Nits: > 1) From <http://tools.ietf.org/html/rfc4271#section-3.1>: > ---snip--- > For the purpose of this protocol, a route is defined as a unit of > information that pairs a set of destinations with the attributes of a > path to those destinations. The set of destinations are systems > whose IP addresses are contained in one IP address prefix that is > carried in the Network Layer Reachability Information (NLRI) field of > an UPDATE message, and the path is the information reported in the > path attributes field of the same UPDATE message. > ---snip--- > ... thus, to be more correct, I think you should scrub the document to say > "route" in places where you're saying "prefix". Some examples include: > > http://tools.ietf.org/html/draft-ietf-grow-ops-reqs-for-bgp-error-handling-04#section-1.2 > ---snip--- > o It is unacceptable within modern deployments of the BGP-4 protocol > that a single erroneous UPDATE packet affects prefixes that it > does not carry. This requirement therefore requires some > ---snip--- > s/affects prefixes/affects routes/ > > ---snip--- > this reset of the BGP-4 session results in interruption to > forwarding packets (by means of withdrawing prefixes installed by > BGP-4 into a device's RIB, and subsequently FIB). To this end, > ---snip--- > s/withdrawing prefixes/withdrawing routes/ [rjs]: Thanks, I've done this to reflect this definition. > > 2) In Section 2, this may be just wording, but I'm unclear on what you mean > in the below where you say: "... handle the error in a manner focused on the > NLRI contained within the message". Are you trying to say something like > "constrain the logging and/or ignoring of malformed UPDATE messages specific > to the narrowest subset of routes for which malformed path attributes have > been received?" I'm not suggesting my wording is necessarily better, in this > case, but perhaps you could look at being a bit more precise below? > ---snip--- > Where an UPDATE message is considered invalid by a BGP speaker due to > an error within a path attribute that is not the NLRI (where the > definition of NLRI includes reachability information encoded in the > MP_REACH_NLRI and MP_UNREACH_NLRI attributes as specified in > [RFC4760]) it is a requirement of any enhanced error handling > mechanism to handle the error in a manner focused on the NLRI > contained within the message. > ---snip--- [rjs]: Yes - the intention is to define this based on the narrowest set possible, the reason that I used this wording is that (in my view) this is defined by the NLRI actually in the message (if there were differing path attributes for NLRI, then we expect that this is packed into a second UPDATE message). Perhaps a hybrid of our wording would clarify this (unless you think the assertion above is erroneous?). > > 3) Section 2: > ---snip--- > contained within the message. Since in this case, the message > received from the remote peer is syntactically valid, it is > considered that such an UPDATE is indicative of erroneous data within > a path attribute. [...] > ---snip--- > s/path attribute/path attributes/ [rjs]: Is the point here "one or more path attributes"? I'm not sure I quite understand the nit? :-) > > 4) In Section 4, there do you really mean "transitive" here or do you > instead mean "transient"? > ---snip--- > It is of particular note for both means of recovering RIB consistency > described that these are effective only when considering transitive > errors within an implementation - for instance, should an RFC ... > ---snip--- [rjs]: Transient, thanks. > ... also, in the following, do you mean "dynamic" instead of "transitive" in > the below? > ---snip--- > [...] It is not advisable that a transitive > filter and advertisement mechanism is triggered by all error handling > events due to the load this is likely to place on the neighbour > receiving such a request. Where this BGP speaker is a relatively > ---snip--- [rjs]: Dynamic is definitely what is meant here. > > 5) In Section 7.1, I think you probably want to s/NLRI/route/ in several > places, e.g.: > ---snip--- > ... upstream speaker to perform a best-path selection, and re-advertise a > new set of __NLRI__ before the downstream system is able to converge to a > new path. > ---snip--- [rjs]: Thanks, nit corrected. Many thanks again for your comments - if you could cast your eyes over the above corrections, and let me know if you feel they're sufficient, that'd be fantastic. In response to all the comments received during this review period, I'll push an -05 of the draft. Kind regards, r. _______________________________________________ GROW mailing list [email protected] https://www.ietf.org/mailman/listinfo/grow
