Re: [GROW] WGLC: draft-ietf-grow-ops-reqs-for-bgp-error-handling-04

Rob Shakir Wed, 18 Jul 2012 10:54:51 -0700

Hi Shane,

Thanks for the comments again, and apologies (again!) for the delay in 
responding.


Please find my responses in-line as [rjs].

On 11 Jul 2012, at 17:50, Shane Amante wrote:

>>> [...snip...]
>> 
>> [rjs]: I tried to add something to cover this that fits in with Section 1.1:
>> 
>>                       <t>
>>                           The combination of the increased number of 
>> deployments of BGP-4 as an intra-AS routing protocol, its use for the 
>> propagation of additional types of routing and service information, and the 
>> growth of IP services has resulted in a substantial increase in the volume 
>> of information carried within BGP-4. In numerous networks, RIB sizes of the 
>> order of millions of entries exist, with particular high-scale points 
>> existing at BGP speakers performing aggregation or functionality designed 
>> improve utilisation of network resources (e.g., route reflector 
>> hierarchies). Whilst clearly an increase in the amount routing information 
>> carried in BGP results in greater impact to services during failures, it is 
>> also critical to their recovery time. The increased time to compute new 
>> paths following a failures and subsequently re-learn them following 
>> recoveries results in greater impact of failures within the protocol, and 
>> hence adds further weight to the requirement to 
 avoid failures affecting all routing, or service, information carried via a 
particular adjacency. Whilst an argument could be made the convergence time of 
BGP-4 can be reduced through additional computational resource being deployed, 
it is notable that significant challenges continue to exist for operators of 
scaling BGP-4, and hence mechanisms which improve the scalability of the 
protocol are of particular note.
>>                       </t>
> 
> 
> The above looks good, but I've made some minor modifications.  See below.
> ---snip---
> The combination of the increased number of deployments of BGP-4 as an 
> intra-AS routing protocol, its use for the propagation of additional types of 
> routing and service information, and the growth of IP services has resulted 
> in a substantial increase in the volume of information carried within BGP-4. 
> In numerous networks, RIB sizes of the order of millions of entries exist 
> within individual BGP speakers, with particularly high-scale points exhibited 
> at BGP speakers performing aggregation or functionality designed improve 
> utilisation of network resources (e.g., route reflector hierarchies). Whilst 
> clearly an increase in the amount routing information carried in BGP results 
> in greater impact to services during failures, which is only amplified by a 
> corresponding increase in recovery times. Following a failure, there is a 
> substantial recovery time to learn, compute and distribute new paths, which 
> results in a greater observed impact to services affected, and hence adds 
> further 
 weight to the requirement to avoid failures altogether or, at least, mitigate 
their impact to the narrowest scope possible, (e.g.: a specific NLRI). Whilst 
an argument could be made that convergence time of BGP-4 could potentially be 
reduced through deployment of additional computational resource, it is notable 
that solution is not necessarily straightforward from an implementation or 
deployment point-of-view, (e.g.: scaling computation resources within a single 
address-family is difficult).  Thus, significant challenges continue to exist 
for operators when scaling BGP-4 deployments, and hence mechanisms which 
improve the scalability of BGP-4 are very important.
> ---snip---

[rjs]: Thanks, other than some minor editorial changes I adopted this paragraph 
-- it seems like a good hybrid.


>>> [...snip...]
>> 
>> [rjs]: I'm not quite clear on whether this gets the point across completely 
>> - do we think that it is just that things have become in the realm of 
>> provisioning activities, or rather is it that there are more and more 
>> functions that are overloading onto BGP. I agree that this sentence doesn't 
>> necessarily capture that - but do you think that it's the generic 
>> information transfer protocol between PEs, as well as replacing provisioning 
>> mechanisms?
> 
> I believe that you are correct, and better off, in stating "more and more 
> functions that are overloaded (sic) onto BGP".  Although, I'm not sure that 
> "overloaded" is an appropriate adjective.  

[rjs]: I guess there may be negative connotations of 'overloaded', I guess what 
I really mean is maybe "layered" onto BGP -- poor wording perhaps.

> The point I was trying to get at is as follows.  I think there's a continuum 
> of information exchanged within BGP from real-time information (reachability) 
> to less dynamic (perhaps, even static) information, with _examples_ of the 
> latter being auto-discovery/provisioning use cases.  While traditional 
> applications, such as vanilla Internet service for which BGP was originally 
> designed, only fall into the "real-time information" category ... there are a 
> lot of new(er) applications that do not fit "neatly" in a single category 
> and, in fact, span the range of real-time to less dynamic categories 
> depending on which facet of a particular protocol you look at, (examples 
> being: IPVPN, MVPN, VPLS-BGP, etc.).  Regardless, I don't think it's prudent 
> to make value judgements (particularly at this point in time when these 
> protocols are already widely deployed and successful) as to the "correctness" 
> of these functions/services being in BGP, since that's bound to be very 
> subjective.  Rathe
 r, we need to recognize the world for what it is today, which is why I think 
use of the word "overloaded" may be inappropriate.  Furthermore, I think that 
talking about this in such a context is only recognizing a symptom (the more 
complex the system, the higher the probability is to introduce errors), when in 
reality we should be trying to focus in on the root problem: since we've put so 
many eggs in one basket, we need unnoticeable (or, faster) recovery from errors 
that affect real-time, reachability information.

[rjs]: Completely agree with this. I think my poor choice of wording perhaps 
portrayed my view as negative -- rather, the key point for me is that the 
robustness and error handling that we are discussing here is designed with the 
vanilla Internet service as the baseline - and as we extend the protocol to 
different deployment cases (no judgement about the value of which is made), 
then some of the initial assumptions perhaps don't hold true. I think this is 
in agreement with yourself, insofar that I think we would both assert that for 
the real-time information, potentially the behaviour required in a number of 
areas of the protocol is not the same as the behaviour required for relatively 
static information.

>> 
>> [rjs]: Yes - the intention is to define this based on the narrowest set 
>> possible, the reason that I used this wording is that (in my view) this is 
>> defined by the NLRI actually in the message (if there were differing path 
>> attributes for NLRI, then we expect that this is packed into a second UPDATE 
>> message). Perhaps a hybrid of our wording would clarify this (unless you 
>> think the assertion above is erroneous?).
> 
> I see your point now.  How about the following hybrid text?
> ---snip---
> ... it is a requirement of any enhanced error handling mechanism to constrain 
> the error handling so that it is narrowly focused on the NLRI contained 
> within the bad UPDATE message.
> ---snip---

[rjs]: Sure, this sounds good.

>>> 3)  Section 2:
>>> ---snip---
>>> contained within the message.  Since in this case, the message
>>> received from the remote peer is syntactically valid, it is
>>> considered that such an UPDATE is indicative of erroneous data within
>>> a path attribute.  [...]
>>> ---snip---
>>> s/path attribute/path attributes/
>> 
>> [rjs]: Is the point here "one or more path attributes"? I'm not sure I quite 
>> understand the nit? :-)
> 
> Yes, sorry: "one or more path attributes".  (My point was you can't predict, 
> here anyway, that it will only a single path attribute that is a problem.  
> Ideally, a more robust error-handling solution would not make such 
> assumptions :-).

[rjs]: ACK, updated this to 'one or more' :-)

>> Many thanks again for your comments - if you could cast your eyes over the 
>> above corrections, and let me know if you feel they're sufficient, that'd be 
>> fantastic.
> 
> And, thank you Rob for your excellent work on this.

[rjs]: No worries - I'll take a read through and submit an -05 of the draft 
that merges the edits we've discussed in this thread.

Thanks again for the comments,
r.

_______________________________________________
GROW mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/grow

Re: [GROW] WGLC: draft-ietf-grow-ops-reqs-for-bgp-error-handling-04

Reply via email to