Hi Chris,

(re: CCing IDR & GROW)

On 28 Dec 2012, at 12:28, Chris Hall wrote:

> Rob Shakir wrote (on Thu 27-Dec-2012 at 18:44):
>> 
>> Any comments very welcome (to me or grow@).
> 
> I'm afraid I still don't get it :-(  What am I missing ?
> 
> UPDATE Message Length errors are Critical because they (1) "result in
> cases whereby the NLRI attribute cannot be correctly extracted".
> 
> The implication is that a failure to extract all NLRI is Critical.  Is
> that a requirement ?

If the NLRI cannot be determined, then this is a Critical error, yes. I left 
the wording relatively open on whether this is *all* NLRI, as I am not sure 
that in the requirements draft we should specify direct solutions to specific 
issues, to e.g., say how to handle cases where MP_REACH_NLRI and 
MP_UNREACH_NLRI are in the same message [this is a case that I do not believe 
is forbidden by rfc2858 - if the working group could clarify whether this is 
something that we feel the draft needs to handle or can explicitly be omitted, 
then that would be appreciated].

> 
> Later: 
> 
>  (2) "All errors whereby the contained NLRI can be
>       extracted are referred to as Non-Critical". 
> 
> And that includes:
> 
>  (3) "where the length of all path attributes contained
>       within the UPDATE does not correspond to the
>       total path attribute length."
> 
> That is, at least, more explicit than
> draft-ietf-idr-error-handling-03, which glosses over (3).
> 
> But if (3) is non-critical then there is some chance that some NLRI
> will not be extracted, which appears to violate (1) and (2).

Disclaimer: As I am sure that my comments previously have made clear, I do not 
maintain a code base for a BGP daemon/implementation - so please feel free to 
correct my logic below.

I do not believe that (3) implies that the NLRI cannot be correctly found. If 
the sum of total length is incorrect, then we can still extract the individual 
attributes - we just find that there is not enough data to fill the overall 
length we were told and/or we have too much attribute data compared to the 
total attribute length. In the case where the NLRI attribute itself has a 
length error, then this is a critical error (based on the "Errors parsing the 
NLRI attribute of an UPDATE message" definition of Critical error), and a 
similar Critical error occurs in the latter case, where the Total Path 
Attributes + Withdrawn Routes are not equal to total UPDATE message length.

Either way -- again, I would say that this is something that we need to put 
text together for draft-ietf-idr-error-handling rather than the requirements 
document (this sounds like a solution, and the requirement does not have a 
SHOULD or MUST here, it is an "it is expected that…" comment).

> Then (4) "In order to maximise the number of cases whereby the NLRI
> attributes [plural, now, BTW] can be reliably extracted from a
> received message...".  Ah.  So it is not a Critical Error if "the NLRI
> attribute cannot be correctly extracted".

No - it is a Critical error if we cannot extract the NLRI. This recommendation 
is to give an increased chance that the NLRI can be extracted as per the IDR 
error handling draft. This then (by virtue of resulting in the NLRI being 
extracted) minimises the number of cases that result in a Critical error. The 
plural here is to reflect that the existence of >1 type of NLRI attribute.

> For me the requirement remains "conflicted".  On the one hand it seems
> to say that it is a Critical Error if the NLRI cannot be extracted and
> parsed.  On the other it seems to say it's OK if you cannot extract
> some NLRI.

If you'll forgive me for removing a significant proportion of your message, I 
think that we need to take another step back here. It seems to me that the key 
question that you are highlighting is "What level of confidence do we need to 
have before we declare that the NLRI cannot be extracted?" -- do you agree?

>From an operator perspective, I would like to compromise *certainty* for 
>*robustness*. You are right, we are compromising correctness here, we might 
>end up withdrawing an incorrect NLRI and impacting service operation for that 
>prefix - however, it is somewhat preferable to me to withdraw a a subset of 
>the NLRI incorrectly, rather than impact all NLRI in one single action. We 
>clearly need to provide some bounds on how much we compromise the certainty 
>(and live within the realms of possibility, such that we are not just taking a 
>shot in the dark). This is what the definitions of Critical and Non-Critical 
>within the document are intended to provide. Once again I will refer to the 
>requirement that there is a balance between correctness and robustness - 
>rather than a locally risk averse approach that results in harmful wider 
>behaviour.

Is it acceptable that we leave this as guidance within the requirements? If 
not, please could you suggest how the definitions of Critical/Non-Critical 
could be altered to address your concerns? I would also appreciate further 
input from IDR as to whether this is sufficient requirement from GROW to allow 
a solution document to be written?

Many thanks,
r.



_______________________________________________
GROW mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/grow

Reply via email to