Jakob Heitz wrote (on Fri 28-Dec-2012 at 21:35 +0000):
....
> All we can hope for after a malformed update is a
> temporary mitigation until human intervention can
> fix the problem.
>
> IMO, the goal of error handling is to limit the
> damage, not to cure the problem.
>From previous discussions it appears that to "limit the damage" you
would avoid session-reset at all costs. Yes ?
Seems to me the goal of "Enhanced Error Handling" is, simply, to be
"not as bad as session-reset".
Where all NLRI in a message can be identified, treat-as-withdraw seems
to me to be not as bad as session-reset.
But: "Once you have a malformed update, NOTHING is certain." So,
there is some risk of "lost NLRI" when tolerating (not resetting the
session) malformed updates.
I don't know if "lost NLRI" can cause problems which are worse than
session-reset. I cannot find anything in
draft-ietf-grow-ops-reqs-for-bgp-error-handling-06 or
draft-ietf-idr-error-handling-03 which addresses the issue. (Though
draft-ietf-idr-error-handling-03 waxes lyrical on "Why not discard
UPDATE messages ?")
If the strategy is to treat-as-withdraw where you can, and continue
with whatever "lost NLRI" there may be, then that's simple enough.
But if that is the strategy, why do the drafts not say so ? The
"ops-reqs" draft sets out to define Critical and Non-Critical errors
in terms of the inability/ability to extract NLRI, which is not
obviously consistent with this strategy.
It may be that "lost NLRI" are, indeed, potentially (in theory) worse
than session-reset, but that the risk is (in practice) reduced to a
tolerable level if a certain amount of care is taken to extract NLRI
-- falling back to session-reset if that process fails. This seems to
me to be quite a sophisticated position -- too sophisticated to be
simply implied by what the drafts do not specify.
On the other hand, if "lost NLRI" are not as bad as session-reset,
then:
a) when parsing the NLRI themselves, why not treat-as-
withdraw everything up to the point where a prefix
length overruns the NLRI collection ?
b) when parsing MP_XXX attributes, why worry about
duplicates ? Why not treat-as-withdraw everything
in sight ?
c) when parsing the message layer, why worry about
consistency of message length and withdrawn
routes and attributes length ?
Since "lost NLRI" are tolerable, why worry about
losing some IPv4 Unicast NLRI ?
d) the presence of 16 x 0xFF is surely sufficient
to signal the start of a BGP Message. So, when
starting to read a message (after having read
message length octets for the previous one),
why not scan forwards looking for 16 x 0xFF ?
I suspect the drafts do not go this far because there are unspoken
assumptions about what is worse than session-reset; either that, or
because the issue of "lost NLRI" has been glossed over. Neither of
which, I suggest, are very satisfactory.
Chris
_______________________________________________
GROW mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/grow