Re: [GROW] [Idr] draft-ietf-grow-ops-reqs-for-bgp-error-handling-05

Rob Shakir Fri, 31 Aug 2012 11:03:06 -0700

Hi Chris,

Thanks for this detailed analysis. It is akin to something that Alton Lo and I 
worked through whilst defining the critical and semantic error types (and 
suggested inclusions).

If you'll forgive me for responding to some particular points, I feel this 
might aid the discussion and positioning. I have added comments in-line marked 
[rjs].

On 31 Aug 2012, at 09:02, Chris Hall wrote:

> This is all pretty low level stuff.  I can hear an argument that the
> requirements document is not the place for this level of detail.
> However, without a more precise understanding of how broken attributes
> may be parsed, requirements for how to deal with them are hard to
> specify and to interpret.

[rjs]: What this draft intends to do is provide expectations, requirements and 
context for error handling in BGP-4, based on current deployments (and 
operator's experience). It also puts forwards requirements for how each type of 
error is reacted to in a broad sense. Essentially, where it came from is 
defining why amending the error handling behaviour is required, and providing a 
framework against which we can hang the different developments that are being 
discussed in IDR, such that they meet the operational challenges that come from 
amending this behaviour and form a complete set of solutions to meet the 
problem space.

[rjs]: I think the error handling solutions draft 
(draft-ietf-idr-error-handling) should take the work that we have done within 
IDR and GROW in this draft and build the next level of detail, which I think 
that you've made a great start to. I would like to try and keep the 
requirements draft such that it can be referred to by both existing attributes, 
and future ones.

> With NLRI mixed up in the attributes, either one plays safe and treats
> all attribute errors as Critical, or a much more detailed analysis of
> attribute parsing is required.  What is the cost of missing some NLRI
> which were sent, but were obscured by some other broken attribute ?
> What is the risk ?  What degree of broken-ness of an attribute can be
> deemed not to invalidate the parsing of the attributes before and/or
> after it ?  Is that different for different attributes ?
> 

[rjs]: Please note that the requirements draft does not present distinctions 
such as recoverable and ignorable. We went around this loop previously. I think 
that in some cases, some specific errors may be handled by 'patching' or 
'ignoring' specific errors. But generically, these are exceptions - the 
requirements try and define broader categories, if a particular attribute needs 
something else (e.g., AS4_PATH may have information it can recover from other 
attributes) then this can be handled in error handling solution considerations 
of these attributes or as it is defined going forward.

[rjs]: It came to my attention whilst reviewing this the existing "ignore" 
wording in the draft is potentially erroneous based on rfc4271:

   All errors detected while processing the UPDATE message MUST be
   indicated by sending the NOTIFICATION message with the Error Code
   UPDATE Message Error.  The error subcode elaborates on the specific
   nature of the error.

[rjs]: Section 3's requirements discuss a need to avoid sending NOTIFICATION, 
and consider the proposed solution that exists. The only recommendation made is 
that treat-as-withdraw looks to be a safe behaviour for ensuring that one is 
not trusting UPDATEs that had some erroneous information in them. 

[rjs]: The requirement the document makes is explicitly that not all errors are 
defined as critical (if they were, the requirement specified by section 3 would 
not be met, and we would stick with the behaviour we have right now). The 
reason for a distinction between critical and semantic is that there are 
certain errors that mean that cannot be localised to certain NLRI.

[rjs]: I hope you do not see these comments as dismissive of what you have put 
together - I think that this is where operational and implementation views 
diverge. My view is that I need to understand what the impact to a service, the 
device and the network is during these error conditions (and balance the risk 
of incorrectness against the correctness of the protocol). From an 
implementation perspective, clearly, one needs to understand exactly which 
circumstances one can extract the NLRI, and the particulars of how this is 
achieved. I would encourage discussion that falls into the latter category such 
that we define the solutions draft to have the relevant guidance where 
required. Comments on the former should absolutely live in the requirements 
draft

[rjs]: I would of course encourage further feedback on this subject from the WG.

Kind regards,
r.

_______________________________________________
GROW mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/grow

Re: [GROW] [Idr] draft-ietf-grow-ops-reqs-for-bgp-error-handling-05

Reply via email to