On 1 Jan 2013, at 17:27, Chris Hall <[email protected]> wrote:
> […snip…] Hi All. I think this is a good summary of the different approaches. In ops-reqs-for-bgp-error-handling-06, there is no category of "fatal" essentially because (as you highlight) the line between the fatal and the critical cases is somewhat blurry. I would propose that we do not add another category of error for "fatal". If I go back to the proposal I made on 31/12: > Would the GROW working group be happy if we address Chris' concern related to > "lost NLRI" (which AFAICS is really the case where we have >1 type of NLRI > attribute within a single message) by adding a note that an error remains > Non-Critical if _at least one_ NLRI attributes can be successfully parsed? > I'm unclear here as to whether we're addressing a common situation where > multiple sets of NLRI are being contained within a single message [1]. If so, > then adding "at least one" and then a further point that an implementation > SHOULD use a single NLRI attribute per UPDATE message, and put this at the > start of the attributes would seem to be a fair way forward. I would suggest that adding the following wording to § 3 of the draft addresses this, and clarifies the issue of "lost" NLRI: "An error SHOULD be defined as Non-Critical if at least one NLRI attribute within an erroneous message can be successfully parsed. In cases where more than one attribute containing NLRI is included within a single UPDATE message, this may result in cases where some NLRI contained within subsequent attributes are missed, particularly where length errors exist in the message. In order to minimise the risk of such occurrences, it is recommended that an implementation SHOULD include only one attribute containing NLRI per message." Additionally -- from the discussions that Jeff Wheeler raised, around repeated errors. In § 5, it seems that there is a further requirement, which I would suggest that we state as: "In order to address repeated instances of critical errors, an implementation SHOULD provide a means by which an operator can enable such errors to be ignored. Where a mechanism of this nature is implemented, it provides a means by which an operator may avoid prolonged session failure which results in isolation from one, or more, routing domains. An operator deploying such a mechanism MUST be aware that holding such sessions up may result in inconsistency within the RIB, which may cause incorrect forwarding of traffic (e.g., loops, or blackholing)." Some feedback on whether this addresses the points discussed over the last few days would be appreciated. Happy New Year, r. _______________________________________________ GROW mailing list [email protected] https://www.ietf.org/mailman/listinfo/grow
