Jakob Heitz wrote (on Fri 28-Dec-2012 at 21:35 +0000):
....
> All we can hope for after a malformed update is a
> temporary mitigation until human intervention can
> fix the problem.
>
> IMO, the goal of error handling is to limit the
> damage, not to cure the problem.

>From previous discussions it appears that to "limit the damage" you
would avoid session-reset at all costs.  Yes ?

Seems to me the goal of "Enhanced Error Handling" is, simply, to be
"not as bad as session-reset".

Where all NLRI in a message can be identified, treat-as-withdraw seems
to me to be not as bad as session-reset.

But: "Once you have a malformed update, NOTHING is certain."  So,
there is some risk of "lost NLRI" when tolerating (not resetting the
session) malformed updates.

I don't know if "lost NLRI" can cause problems which are worse than
session-reset.  I cannot find anything in
draft-ietf-grow-ops-reqs-for-bgp-error-handling-06 or
draft-ietf-idr-error-handling-03 which addresses the issue.  (Though
draft-ietf-idr-error-handling-03 waxes lyrical on "Why not discard
UPDATE messages ?")  

If the strategy is to treat-as-withdraw where you can, and continue
with whatever "lost NLRI" there may be, then that's simple enough.
But if that is the strategy, why do the drafts not say so ?  The
"ops-reqs" draft sets out to define Critical and Non-Critical errors
in terms of the inability/ability to extract NLRI, which is not
obviously consistent with this strategy.

It may be that "lost NLRI" are, indeed, potentially (in theory) worse
than session-reset, but that the risk is (in practice) reduced to a
tolerable level if a certain amount of care is taken to extract NLRI
-- falling back to session-reset if that process fails.  This seems to
me to be quite a sophisticated position -- too sophisticated to be
simply implied by what the drafts do not specify.

On the other hand, if "lost NLRI" are not as bad as session-reset,
then:

  a) when parsing the NLRI themselves, why not treat-as-
     withdraw everything up to the point where a prefix
     length overruns the NLRI collection ?

  b) when parsing MP_XXX attributes, why worry about
     duplicates ?  Why not treat-as-withdraw everything
     in sight ?

  c) when parsing the message layer, why worry about
     consistency of message length and withdrawn
     routes and attributes length ?

     Since "lost NLRI" are tolerable, why worry about
     losing some IPv4 Unicast NLRI ?

  d) the presence of 16 x 0xFF is surely sufficient
     to signal the start of a BGP Message.  So, when
     starting to read a message (after having read
     message length octets for the previous one),
     why not scan forwards looking for 16 x 0xFF ?

I suspect the drafts do not go this far because there are unspoken
assumptions about what is worse than session-reset; either that, or
because the issue of "lost NLRI" has been glossed over.  Neither of
which, I suggest, are very satisfactory.

Chris


_______________________________________________
GROW mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/grow

Reply via email to