Hi Chris, Thanks for this detailed analysis. It is akin to something that Alton Lo and I worked through whilst defining the critical and semantic error types (and suggested inclusions).
If you'll forgive me for responding to some particular points, I feel this might aid the discussion and positioning. I have added comments in-line marked [rjs]. On 31 Aug 2012, at 09:02, Chris Hall wrote: > This is all pretty low level stuff. I can hear an argument that the > requirements document is not the place for this level of detail. > However, without a more precise understanding of how broken attributes > may be parsed, requirements for how to deal with them are hard to > specify and to interpret. [rjs]: What this draft intends to do is provide expectations, requirements and context for error handling in BGP-4, based on current deployments (and operator's experience). It also puts forwards requirements for how each type of error is reacted to in a broad sense. Essentially, where it came from is defining why amending the error handling behaviour is required, and providing a framework against which we can hang the different developments that are being discussed in IDR, such that they meet the operational challenges that come from amending this behaviour and form a complete set of solutions to meet the problem space. [rjs]: I think the error handling solutions draft (draft-ietf-idr-error-handling) should take the work that we have done within IDR and GROW in this draft and build the next level of detail, which I think that you've made a great start to. I would like to try and keep the requirements draft such that it can be referred to by both existing attributes, and future ones. > With NLRI mixed up in the attributes, either one plays safe and treats > all attribute errors as Critical, or a much more detailed analysis of > attribute parsing is required. What is the cost of missing some NLRI > which were sent, but were obscured by some other broken attribute ? > What is the risk ? What degree of broken-ness of an attribute can be > deemed not to invalidate the parsing of the attributes before and/or > after it ? Is that different for different attributes ? > [rjs]: Please note that the requirements draft does not present distinctions such as recoverable and ignorable. We went around this loop previously. I think that in some cases, some specific errors may be handled by 'patching' or 'ignoring' specific errors. But generically, these are exceptions - the requirements try and define broader categories, if a particular attribute needs something else (e.g., AS4_PATH may have information it can recover from other attributes) then this can be handled in error handling solution considerations of these attributes or as it is defined going forward. [rjs]: It came to my attention whilst reviewing this the existing "ignore" wording in the draft is potentially erroneous based on rfc4271: All errors detected while processing the UPDATE message MUST be indicated by sending the NOTIFICATION message with the Error Code UPDATE Message Error. The error subcode elaborates on the specific nature of the error. [rjs]: Section 3's requirements discuss a need to avoid sending NOTIFICATION, and consider the proposed solution that exists. The only recommendation made is that treat-as-withdraw looks to be a safe behaviour for ensuring that one is not trusting UPDATEs that had some erroneous information in them. [rjs]: The requirement the document makes is explicitly that not all errors are defined as critical (if they were, the requirement specified by section 3 would not be met, and we would stick with the behaviour we have right now). The reason for a distinction between critical and semantic is that there are certain errors that mean that cannot be localised to certain NLRI. [rjs]: I hope you do not see these comments as dismissive of what you have put together - I think that this is where operational and implementation views diverge. My view is that I need to understand what the impact to a service, the device and the network is during these error conditions (and balance the risk of incorrectness against the correctness of the protocol). From an implementation perspective, clearly, one needs to understand exactly which circumstances one can extract the NLRI, and the particulars of how this is achieved. I would encourage discussion that falls into the latter category such that we define the solutions draft to have the relevant guidance where required. Comments on the former should absolutely live in the requirements draft [rjs]: I would of course encourage further feedback on this subject from the WG. Kind regards, r. _______________________________________________ GROW mailing list [email protected] https://www.ietf.org/mailman/listinfo/grow
