Re: [GROW] [Idr] I-D Action: draft-ietf-grow-ops-reqs-for-bgp-error-handling-06.txt

John Leslie Wed, 02 Jan 2013 08:02:06 -0800

Jakob Heitz <[email protected]> wrote:
> On Dec 31, 2012, at 1:21 AM, "Robert Raszuk" <[email protected]> wrote:
>> On Mon, Dec 31, 2012 at 4:09 AM, John Leslie <[email protected]> wrote:
>>> Brian Dickson <[email protected]> wrote:
>>>> 
>>>> But, the basic problem is this: missing an UPDATE won't trigger either
>>>> condition, it will at worst cause sub-optimal routing.


   I apologize for careless reading here. I took Brian to mean failing
to process an UPDATE which changed the NLRI from a peer, but didn't
change the _reachability_ of that block from that peer, breaks things
but mildly.

>>>> Missing a WITHDRAW _CAN_ cause Bad Things (TM) to happen.
>>> 
>>>   +1

   I meant to +1 the statement that missing the notice that an NLRI
was being withdrawn and not replaced (thereby changing reachability
from positive to negative) is a much more serious breakage, since
you may keep sending packets to that peer which no longer has the
reachability you think it does.

>> How about if we would mandate Enhance Route Refresh request to be send
>> to such peer who's bad updates triggered treat-as-withdraw action ?

   A Route-Refresh cycle would ensure that we have the actual
reachability status of that peer (if the Route-Refresh completes)
instead of some outdated NLRI the peer intended to withdraw without
replacement.

>> Yes that means that "treat-as-withdraw" should be applied only to
>> those peers which support Enhanced Route Refresh.

   We're discussing an area where "treat-as-withdraw" may no longer
be a useful term, since we're considering an UPDATE so badly formed
that we can't even extract which CIDR blocks are the subject of it.

   Personally, I'd prefer a Route-Refresh approach, where we make it
perfectly clear to the peer in question that we no longer have a
trustworthy state of its current advertised routing.

> I don't think treat-as-withdraw is trying to fix a single session reset.
> Graceful restart can fix that. It's the rolling resets that need a
> human to remove a buggy router or a config that triggered the bug.
> That takes several hours. Treat-as-withdraw limits the damage during
> those hours.

   I agree with Jakob that we're _trying_ to discuss that -- but I
don't agree we're succeeding very well.

> Could we please settle on that without trying to solve the impossible?

   I'm not sure we can...

   First of all, most of us _aren't_ trying to solve the impossible:
we're trying to design an error-management paradigm which passes _enough_
information on which to base intelligent routing decisions.

   And, I'm not sure what it means to "settle on" "treat-as-withdraw"
that some of us consider "good enough" for a "limited" period while
humans are working to resolve the actual problem. We know that the
"limits" on such a period correlate all to well with the amount of
pain being inflicted on paying customers.

--
John Leslie <[email protected]>
_______________________________________________
GROW mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/grow

Re: [GROW] [Idr] I-D Action: draft-ietf-grow-ops-reqs-for-bgp-error-handling-06.txt

Reply via email to