Re: [GROW] WGLC: draft-ietf-grow-ops-reqs-for-bgp-error-handling-04

Christopher Morrow Sun, 22 Jul 2012 19:17:23 -0700

Rob,
Did you want to spin a new version of the draft and get final comments
from Shane? then move this along to IESG-land?


Or are there still comments/issues to deal with from other folk? (the
russ/robert discussion seemed to peter out as well)

-chris

On Wed, Jul 18, 2012 at 1:54 PM, Rob Shakir <[email protected]> wrote:
> Hi Shane,
>
> Thanks for the comments again, and apologies (again!) for the delay in 
> responding.
>
> Please find my responses in-line as [rjs].
>
> On 11 Jul 2012, at 17:50, Shane Amante wrote:
>
>>>> [...snip...]
>>>
>>> [rjs]: I tried to add something to cover this that fits in with Section 1.1:
>>>
>>>                       <t>
>>>                           The combination of the increased number of 
>>> deployments of BGP-4 as an intra-AS routing protocol, its use for the 
>>> propagation of additional types of routing and service information, and the 
>>> growth of IP services has resulted in a substantial increase in the volume 
>>> of information carried within BGP-4. In numerous networks, RIB sizes of the 
>>> order of millions of entries exist, with particular high-scale points 
>>> existing at BGP speakers performing aggregation or functionality designed 
>>> improve utilisation of network resources (e.g., route reflector 
>>> hierarchies). Whilst clearly an increase in the amount routing information 
>>> carried in BGP results in greater impact to services during failures, it is 
>>> also critical to their recovery time. The increased time to compute new 
>>> paths following a failures and subsequently re-learn them following 
>>> recoveries results in greater impact of failures within the protocol, and 
>>> hence adds further weight to the requirement to
>  avoid failures affecting all routing, or service, information carried via a 
> particular adjacency. Whilst an argument could be made the convergence time 
> of BGP-4 can be reduced through additional computational resource being 
> deployed, it is notable that significant challenges continue to exist for 
> operators of scaling BGP-4, and hence mechanisms which improve the 
> scalability of the protocol are of particular note.
>>>                       </t>
>>
>>
>> The above looks good, but I've made some minor modifications.  See below.
>> ---snip---
>> The combination of the increased number of deployments of BGP-4 as an 
>> intra-AS routing protocol, its use for the propagation of additional types 
>> of routing and service information, and the growth of IP services has 
>> resulted in a substantial increase in the volume of information carried 
>> within BGP-4. In numerous networks, RIB sizes of the order of millions of 
>> entries exist within individual BGP speakers, with particularly high-scale 
>> points exhibited at BGP speakers performing aggregation or functionality 
>> designed improve utilisation of network resources (e.g., route reflector 
>> hierarchies). Whilst clearly an increase in the amount routing information 
>> carried in BGP results in greater impact to services during failures, which 
>> is only amplified by a corresponding increase in recovery times. Following a 
>> failure, there is a substantial recovery time to learn, compute and 
>> distribute new paths, which results in a greater observed impact to services 
>> affected, and hence adds further
>  weight to the requirement to avoid failures altogether or, at least, 
> mitigate their impact to the narrowest scope possible, (e.g.: a specific 
> NLRI). Whilst an argument could be made that convergence time of BGP-4 could 
> potentially be reduced through deployment of additional computational 
> resource, it is notable that solution is not necessarily straightforward from 
> an implementation or deployment point-of-view, (e.g.: scaling computation 
> resources within a single address-family is difficult).  Thus, significant 
> challenges continue to exist for operators when scaling BGP-4 deployments, 
> and hence mechanisms which improve the scalability of BGP-4 are very 
> important.
>> ---snip---
>
> [rjs]: Thanks, other than some minor editorial changes I adopted this 
> paragraph -- it seems like a good hybrid.
>
>
>>>> [...snip...]
>>>
>>> [rjs]: I'm not quite clear on whether this gets the point across completely 
>>> - do we think that it is just that things have become in the realm of 
>>> provisioning activities, or rather is it that there are more and more 
>>> functions that are overloading onto BGP. I agree that this sentence doesn't 
>>> necessarily capture that - but do you think that it's the generic 
>>> information transfer protocol between PEs, as well as replacing 
>>> provisioning mechanisms?
>>
>> I believe that you are correct, and better off, in stating "more and more 
>> functions that are overloaded (sic) onto BGP".  Although, I'm not sure that 
>> "overloaded" is an appropriate adjective.
>
> [rjs]: I guess there may be negative connotations of 'overloaded', I guess 
> what I really mean is maybe "layered" onto BGP -- poor wording perhaps.
>
>> The point I was trying to get at is as follows.  I think there's a continuum 
>> of information exchanged within BGP from real-time information 
>> (reachability) to less dynamic (perhaps, even static) information, with 
>> _examples_ of the latter being auto-discovery/provisioning use cases.  While 
>> traditional applications, such as vanilla Internet service for which BGP was 
>> originally designed, only fall into the "real-time information" category ... 
>> there are a lot of new(er) applications that do not fit "neatly" in a single 
>> category and, in fact, span the range of real-time to less dynamic 
>> categories depending on which facet of a particular protocol you look at, 
>> (examples being: IPVPN, MVPN, VPLS-BGP, etc.).  Regardless, I don't think 
>> it's prudent to make value judgements (particularly at this point in time 
>> when these protocols are already widely deployed and successful) as to the 
>> "correctness" of these functions/services being in BGP, since that's bound 
>> to be very subjective.  Rath
 e
>  r, we need to recognize the world for what it is today, which is why I think 
> use of the word "overloaded" may be inappropriate.  Furthermore, I think that 
> talking about this in such a context is only recognizing a symptom (the more 
> complex the system, the higher the probability is to introduce errors), when 
> in reality we should be trying to focus in on the root problem: since we've 
> put so many eggs in one basket, we need unnoticeable (or, faster) recovery 
> from errors that affect real-time, reachability information.
>
> [rjs]: Completely agree with this. I think my poor choice of wording perhaps 
> portrayed my view as negative -- rather, the key point for me is that the 
> robustness and error handling that we are discussing here is designed with 
> the vanilla Internet service as the baseline - and as we extend the protocol 
> to different deployment cases (no judgement about the value of which is 
> made), then some of the initial assumptions perhaps don't hold true. I think 
> this is in agreement with yourself, insofar that I think we would both assert 
> that for the real-time information, potentially the behaviour required in a 
> number of areas of the protocol is not the same as the behaviour required for 
> relatively static information.
>
>>>
>>> [rjs]: Yes - the intention is to define this based on the narrowest set 
>>> possible, the reason that I used this wording is that (in my view) this is 
>>> defined by the NLRI actually in the message (if there were differing path 
>>> attributes for NLRI, then we expect that this is packed into a second 
>>> UPDATE message). Perhaps a hybrid of our wording would clarify this (unless 
>>> you think the assertion above is erroneous?).
>>
>> I see your point now.  How about the following hybrid text?
>> ---snip---
>> ... it is a requirement of any enhanced error handling mechanism to 
>> constrain the error handling so that it is narrowly focused on the NLRI 
>> contained within the bad UPDATE message.
>> ---snip---
>
> [rjs]: Sure, this sounds good.
>
>>>> 3)  Section 2:
>>>> ---snip---
>>>> contained within the message.  Since in this case, the message
>>>> received from the remote peer is syntactically valid, it is
>>>> considered that such an UPDATE is indicative of erroneous data within
>>>> a path attribute.  [...]
>>>> ---snip---
>>>> s/path attribute/path attributes/
>>>
>>> [rjs]: Is the point here "one or more path attributes"? I'm not sure I 
>>> quite understand the nit? :-)
>>
>> Yes, sorry: "one or more path attributes".  (My point was you can't predict, 
>> here anyway, that it will only a single path attribute that is a problem.  
>> Ideally, a more robust error-handling solution would not make such 
>> assumptions :-).
>
> [rjs]: ACK, updated this to 'one or more' :-)
>
>>> Many thanks again for your comments - if you could cast your eyes over the 
>>> above corrections, and let me know if you feel they're sufficient, that'd 
>>> be fantastic.
>>
>> And, thank you Rob for your excellent work on this.
>
> [rjs]: No worries - I'll take a read through and submit an -05 of the draft 
> that merges the edits we've discussed in this thread.
>
> Thanks again for the comments,
> r.
>
> _______________________________________________
> GROW mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/grow
_______________________________________________
GROW mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/grow

Re: [GROW] WGLC: draft-ietf-grow-ops-reqs-for-bgp-error-handling-04

Reply via email to