On 1 Jan 2013, at 17:27, Chris Hall <[email protected]> wrote:

> […snip…]


Hi All.

I think this is a good summary of the different approaches. In 
ops-reqs-for-bgp-error-handling-06, there is no category of "fatal" essentially 
because (as you highlight) the line between the fatal and the critical cases is 
somewhat blurry. I would propose that we do not add another category of error 
for "fatal".

If I go back to the proposal I made on 31/12:

> Would the GROW working group be happy if we address Chris' concern related to 
> "lost NLRI" (which AFAICS is really the case where we have >1 type of NLRI 
> attribute within a single message) by adding a note that an error remains 
> Non-Critical if _at least one_ NLRI attributes can be successfully parsed? 
> I'm unclear here as to whether we're addressing a common situation where 
> multiple sets of NLRI are being contained within a single message [1]. If so, 
> then adding "at least one" and then a further point that an implementation 
> SHOULD use a single NLRI attribute per UPDATE message, and put this at the 
> start of the attributes would seem to be a fair way forward.

I would suggest that adding the following wording to § 3 of the draft addresses 
this, and clarifies the issue of "lost" NLRI:

"An error SHOULD be defined as Non-Critical if at least one NLRI attribute 
within an erroneous message can be successfully parsed. In cases where more 
than one attribute containing NLRI is included within a single UPDATE message, 
this may result in cases where some NLRI contained within subsequent attributes 
are missed, particularly where length errors exist in the message. In order to 
minimise the risk of such occurrences, it is recommended that an implementation 
SHOULD include only one attribute containing NLRI per message." 

Additionally -- from the discussions that Jeff Wheeler raised, around repeated 
errors. In § 5, it seems that there is a further requirement, which I would 
suggest that we state as:

"In order to address repeated instances of critical errors, an implementation 
SHOULD provide a means by which an operator can enable such errors to be 
ignored. Where a mechanism of this nature is implemented, it provides a means 
by which an operator may avoid prolonged session failure which results in 
isolation from one, or more, routing domains. An operator deploying such a 
mechanism MUST be aware that holding such sessions up may result in 
inconsistency within the RIB, which may cause incorrect forwarding of traffic 
(e.g., loops, or blackholing)."

Some feedback on whether this addresses the points discussed over the last few 
days would be appreciated.

Happy New Year,
r.
_______________________________________________
GROW mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/grow

Reply via email to