Re: [GROW] WGLC: draft-ietf-grow-ops-reqs-for-bgp-error-handling-04

Rob Shakir Tue, 10 Jul 2012 09:27:13 -0700

Hi Shane,

Thanks very much for the detailed review of this document, and apologies it has 
taken a short while for me to reply.


I've responded to your comments in-line as [rjs].

On 25 Jun 2012, at 22:15, Shane Amante wrote:

> I've read this draft and support it being published as an Informational RFC.
> 
> I do have some comments, which are not blocking, but that I do hope improve 
> the clarity & readability of the draft.
> 
> 1)  In Section 1.1, the following paragraph hints at the problems with 
> current network architectures, but doesn't really come out and say what they 
> are more directly.
> ---snip---
> Traditional network architectures would deploy an Interior Gateway
> Protocol (IGP) to carry infrastructure and customer prefixes, with an
> Exterior Gateway Protocol (EGP) such as BGP being utilised to
> propagate these prefixes to other Autonomous Systems.  [...]
> ---snip---
> IMO, it would be better if the following section more directly said that 
> scale, in terms of the sheer amount of routing information in BGP, has 
> increased substantially, particularly in places like the core of a Route 
> Reflection hierarchy with order _several_ million paths, and rising.  I wish 
> I could point you at a link to a talk Danny & I gave to GROW and/or IDR at 
> IETF 71 (?), but unfortunately I can't find a copy of the slides on the 
> IETF's Web site.  You can see that it's in the GROW WG Agenda at IETF 71, but 
> the slides link to a different presentation: 
> <http://tools.ietf.org/wg/grow/agenda?item=agenda71.html>
> 
> Anyway, IMO, this is (from my vantage point) the single factor that makes 
> outages not only much more visible, but /also/ (just as importantly) take 
> longer to recover from ... (at the end of the day, CPU is finite, there are 
> no "easy" answers wrt using multi-core processors to speed-up convergence of 
> single AFI's, DRAM speeds are not keeping pace, etc.)  This may be useful to 
> mention here to "set the stage", as it were, since throughout the draft 
> there's a continuous (and, good) dialogue of not making convergence times 
> worse or CPU load increase as a result of incorporating better recovery 
> capabilities.  IOW, another benefit you don't really mention anywhere, but 
> probably should, is that /theoretically/ some of these mechanisms may help 
> further promote scalability of BGP.  Obviously, we won't know that until we 
> are evaluating solutions and researchers have time to study them, etc.

[rjs]: I tried to add something to cover this that fits in with Section 1.1:

                        <t>
                            The combination of the increased number of 
deployments of BGP-4 as an intra-AS routing protocol, its use for the 
propagation of additional types of routing and service information, and the 
growth of IP services has resulted in a substantial increase in the volume of 
information carried within BGP-4. In numerous networks, RIB sizes of the order 
of millions of entries exist, with particular high-scale points existing at BGP 
speakers performing aggregation or functionality designed improve utilisation 
of network resources (e.g., route reflector hierarchies). Whilst clearly an 
increase in the amount routing information carried in BGP results in greater 
impact to services during failures, it is also critical to their recovery time. 
The increased time to compute new paths following a failures and subsequently 
re-learn them following recoveries results in greater impact of failures within 
the protocol, and hence adds further weight to the requirement to a
 void failures affecting all routing, or service, information carried via a 
particular adjacency. Whilst an argument could be made the convergence time of 
BGP-4 can be reduced through additional computational resource being deployed, 
it is notable that significant challenges continue to exist for operators of 
scaling BGP-4, and hence mechanisms which improve the scalability of the 
protocol are of particular note.
                        </t>

[rjs]: Do you think this captures the point that was missing? I tried to avoid 
making any point that asserts that the requirements defined are going to make 
BGP scale better, but rather just call out that operationally, throwing 
resources at the problem isn't something that's always possible, so anything 
that reduces the amount of "work" BGP needs to do is interesting.

> 
> Related to the above, there is also the following paragraph:
> ---snip---
> Along with this change in role, the nature of the IP routing
> information that is carried has changed.  BGP has become a ubiquitous
> means by which service information can be propagated between devices.
> ---snip---
> ... it may be better to say that the original scope of BGP wrt *just* 
> exchanging reachability information has expanded to include functions that 
> *were* typically in the domain of 'traditional' provisioning activities, and 
> leave it at that.  (You are obviously familiar with things like BGP A-D + 
> VPLS-LDP vs. VPLS-BGP vs. RFC 4364 ... where the former divorces service 
> discovery [provisioning] from signaling/network reachability, whereas the 
> latter two tightly couple service discovery & network reachability).

[rjs]: I'm not quite clear on whether this gets the point across completely - 
do we think that it is just that things have become in the realm of 
provisioning activities, or rather is it that there are more and more functions 
that are overloading onto BGP. I agree that this sentence doesn't necessarily 
capture that - but do you think that it's the generic information transfer 
protocol between PEs, as well as replacing provisioning mechanisms?

> 
> 2)  In Section 3, wrt the discussion on draft-ietf-idr-optional-transitive, 
> it may be much more compelling to state that practically this solution has 
> (currently) only been defined to be applicable to 3 out of the 29 total BGP 
> path attributes that have been assigned by IANA, at this time.  And, even if 
> you *could* to go farther, then only 10 out of the total 29 BGP path 
> attributes are optional transitive.  So, the short story is: a lot more work 
> in the area of session preservation
> of iBGP sessions remains to be done, in particular wrt the other types of 
> mandatory, discretionary, etc. attributes ... (assuming that such a thing is 
> even technically possible with BGPv4, as we know it).

[rjs]: Agreed, it is definitely worth highlighting this. I amended the 
paragraph mentioned to cover this:

                    <t>
                        At the time of writing, error handling mechanisms 
related to optional, transitive attributes - such as <xref 
target='I-D.ietf-idr-optional-transitive' /> are restricted to handling only a 
subset of attribute errors - whereas the operational requirement is to expand 
this coverage to the widest set of errors possible (i.e., all semantic errors 
within UPDATE messages). Additionally, where approaches applicable to a greater 
number of attributes are proposed (e.g., <xref 
target='I-D.chen-ebgp-error-handing'>), these are limited to deployment in eBGP 
applications only, where requirements also exist in intra-domain cases. As 
such, it is envisaged that if extended to cover these expanded cases, these 
mechanisms provide a means to avoid the transmission of a NOTIFICATION message 
to a remote BGP speaker, based on a single erroneous message, where at all 
possible, and hence meet this requirement. Critical errors, including those 
whereby the NLRI cannot be extract
 ed from the UPDATE message, represent cases whereby the receiving system 
cannot handle the error gracefully based on this mechanism.
                    </t>


> 
> [ snipped ]
> 
> 4)  In Section 4, I can't really discern a strong (?) preference for one of 
> the associated recommendations to recovering RIB consistency.  I *think* what 
> you should say, more clearly, is there is a strong preference toward narrow, 
> targeted refresh mechanisms covering specific routes over those broader 
> solutions (recovering an entire Adj-RIB-In), to reduce the
> impact on CPU, DRAM, etc.  Perhaps you could add a sentence at the end of the 
> 4th paragraph, where you discuss draft-zeng-one-time-prefix-orf, along the 
> lines of what I just mentioned above?  The good news is you state this very 
> clearly, but much later, in Section 7.  Perhaps you can pull this sentence 
> from Section 7 up to Section 4, as well?
> ---snip---
> [...] It is recommended that where available, any automatic (or
> manual) triggered recovery mechanism behaviour utilises such targeted
> means in preference to any whole RIB refresh mechanism (such as
> ROUTE-REFRESH).
> ---snip--- 

[rjs]: Agreed, yes. I think that there is a strong preference for the targeted 
mechanism in terms of scale. I rephrased the Section 3 text to be stronger, and 
left in the Section 7 text:

        <t>
                Whilst re-advertisement of the whole BGP RIB provides a means 
by which withdrawn NLRI can be re-advertised, there are some scaling 
implications that must be considered. In the case that a ROUTE-REFRESH is 
generated, all NLRI must be re-packed into UPDATE messages and advertised by 
one speaker on the BGP session, whilst the other must receive all UPDATE 
messages, and validate the RIB's consistency. In order to avoid the 
control-plane load, it is therefore a requirement to utilise targeted 
mechanisms where possible, rather than incurring the additional load on both 
the advertising and receiving speaker of building and processing UPDATEs for 
the entire contents of the RIB.
        </t>


> 5)  Section 5 seems targeted solely at recreating the data structures at one 
> side of a BGP session, (specifically a receiver of a BGP session).  However, 
> who's to say the corruption is not actually in the Adj-RIB-Out of the BGP 
> transmitter?  Thus, transmitting you an Adj-RIB-Out all over again, even 
> "gracefully", is unlikely to do much good.  Perhaps it would be good to 
> consider whether the "Receiving Speaker", in Graceful Restart terminology, 
> should consider 'gracefully' purging and recreating his Adj-RIB-Out data 
> structures, as well, before re-xmit'ing them to the "Restarting Speaker"?

[rjs]: Absolutely, this definitely wasn't handled in the draft -- thanks for 
pointing it out. I added the following text - do you think this is sufficient?

                    <t>
                        In some cases, the erroneous condition may be due to 
corruption of the Adj-RIB-Out on the advertising BGP speaker - rather than 
caused by the receiving speaker's state. In these cases, where existing 
structures are replayed whilst performing graceful restart functionality, the 
error condition is not necessarily resolved. Therefore, it is recommended that 
during a session restart event, as described within this section, the 
advertising speaker purge and rebuild RIB structures, in order to resolve any 
corruption within these structures.
                    </t> 

> 
> 
> 
> 
> Nits:
> 1)  From <http://tools.ietf.org/html/rfc4271#section-3.1>:
> ---snip---
> For the purpose of this protocol, a route is defined as a unit of
> information that pairs a set of destinations with the attributes of a
> path to those destinations.  The set of destinations are systems
> whose IP addresses are contained in one IP address prefix that is
> carried in the Network Layer Reachability Information (NLRI) field of
> an UPDATE message, and the path is the information reported in the
> path attributes field of the same UPDATE message.
> ---snip---
> ... thus, to be more correct, I think you should scrub the document to say 
> "route" in places where you're saying "prefix".  Some examples include:
> 
> http://tools.ietf.org/html/draft-ietf-grow-ops-reqs-for-bgp-error-handling-04#section-1.2
> ---snip---
> o  It is unacceptable within modern deployments of the BGP-4 protocol
>   that a single erroneous UPDATE packet affects prefixes that it
>   does not carry.  This requirement therefore requires some
> ---snip---
> s/affects prefixes/affects routes/
> 
> ---snip---
> this reset of the BGP-4 session results in interruption to
> forwarding packets (by means of withdrawing prefixes installed by
> BGP-4 into a device's RIB, and subsequently FIB).  To this end,
> ---snip---
> s/withdrawing prefixes/withdrawing routes/

[rjs]: Thanks, I've done this to reflect this definition.

> 
> 2)  In Section 2, this may be just wording, but I'm unclear on what you mean 
> in the below where you say: "... handle the error in a manner focused on the 
> NLRI contained within the message".  Are you trying to say something like 
> "constrain the logging and/or ignoring of malformed UPDATE messages specific 
> to the narrowest subset of routes for which malformed path attributes have 
> been received?"  I'm not suggesting my wording is necessarily better, in this 
> case, but perhaps you could look at being a bit more precise below?
> ---snip---
> Where an UPDATE message is considered invalid by a BGP speaker due to
> an error within a path attribute that is not the NLRI (where the
> definition of NLRI includes reachability information encoded in the
> MP_REACH_NLRI and MP_UNREACH_NLRI attributes as specified in
> [RFC4760]) it is a requirement of any enhanced error handling
> mechanism to handle the error in a manner focused on the NLRI
> contained within the message.  
> ---snip---

[rjs]: Yes - the intention is to define this based on the narrowest set 
possible, the reason that I used this wording is that (in my view) this is 
defined by the NLRI actually in the message (if there were differing path 
attributes for NLRI, then we expect that this is packed into a second UPDATE 
message). Perhaps a hybrid of our wording would clarify this (unless you think 
the assertion above is erroneous?).

> 
> 3)  Section 2:
> ---snip---
> contained within the message.  Since in this case, the message
> received from the remote peer is syntactically valid, it is
> considered that such an UPDATE is indicative of erroneous data within
> a path attribute.  [...]
> ---snip---
> s/path attribute/path attributes/

[rjs]: Is the point here "one or more path attributes"? I'm not sure I quite 
understand the nit? :-)

> 
> 4)  In Section 4, there do you really mean "transitive" here or do you 
> instead mean "transient"?
> ---snip---
> It is of particular note for both means of recovering RIB consistency
> described that these are effective only when considering transitive
> errors within an implementation - for instance, should an RFC ...
> ---snip---

[rjs]: Transient, thanks.

> ... also, in the following, do you mean "dynamic" instead of "transitive" in 
> the below?
> ---snip---
> [...] It is not advisable that a transitive
> filter and advertisement mechanism is triggered by all error handling
> events due to the load this is likely to place on the neighbour
> receiving such a request.  Where this BGP speaker is a relatively
> ---snip---

[rjs]: Dynamic is definitely what is meant here.

> 
> 5)  In Section 7.1, I think you probably want to s/NLRI/route/ in several 
> places, e.g.:
> ---snip---
> ... upstream speaker to perform a best-path selection, and re-advertise a
> new set of __NLRI__ before the downstream system is able to converge to a
> new path.
> ---snip---

[rjs]: Thanks, nit corrected.

Many thanks again for your comments - if you could cast your eyes over the 
above corrections, and let me know if you feel they're sufficient, that'd be 
fantastic.

In response to all the comments received during this review period, I'll push 
an -05 of the draft.

Kind regards,
r.

_______________________________________________
GROW mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/grow

Re: [GROW] WGLC: draft-ietf-grow-ops-reqs-for-bgp-error-handling-04

Reply via email to