Jakob Heitz <[email protected]> wrote: > On Dec 31, 2012, at 1:21 AM, "Robert Raszuk" <[email protected]> wrote: >> On Mon, Dec 31, 2012 at 4:09 AM, John Leslie <[email protected]> wrote: >>> Brian Dickson <[email protected]> wrote: >>>> >>>> But, the basic problem is this: missing an UPDATE won't trigger either >>>> condition, it will at worst cause sub-optimal routing.
I apologize for careless reading here. I took Brian to mean failing to process an UPDATE which changed the NLRI from a peer, but didn't change the _reachability_ of that block from that peer, breaks things but mildly. >>>> Missing a WITHDRAW _CAN_ cause Bad Things (TM) to happen. >>> >>> +1 I meant to +1 the statement that missing the notice that an NLRI was being withdrawn and not replaced (thereby changing reachability from positive to negative) is a much more serious breakage, since you may keep sending packets to that peer which no longer has the reachability you think it does. >> How about if we would mandate Enhance Route Refresh request to be send >> to such peer who's bad updates triggered treat-as-withdraw action ? A Route-Refresh cycle would ensure that we have the actual reachability status of that peer (if the Route-Refresh completes) instead of some outdated NLRI the peer intended to withdraw without replacement. >> Yes that means that "treat-as-withdraw" should be applied only to >> those peers which support Enhanced Route Refresh. We're discussing an area where "treat-as-withdraw" may no longer be a useful term, since we're considering an UPDATE so badly formed that we can't even extract which CIDR blocks are the subject of it. Personally, I'd prefer a Route-Refresh approach, where we make it perfectly clear to the peer in question that we no longer have a trustworthy state of its current advertised routing. > I don't think treat-as-withdraw is trying to fix a single session reset. > Graceful restart can fix that. It's the rolling resets that need a > human to remove a buggy router or a config that triggered the bug. > That takes several hours. Treat-as-withdraw limits the damage during > those hours. I agree with Jakob that we're _trying_ to discuss that -- but I don't agree we're succeeding very well. > Could we please settle on that without trying to solve the impossible? I'm not sure we can... First of all, most of us _aren't_ trying to solve the impossible: we're trying to design an error-management paradigm which passes _enough_ information on which to base intelligent routing decisions. And, I'm not sure what it means to "settle on" "treat-as-withdraw" that some of us consider "good enough" for a "limited" period while humans are working to resolve the actual problem. We know that the "limits" on such a period correlate all to well with the amount of pain being inflicted on paying customers. -- John Leslie <[email protected]> _______________________________________________ GROW mailing list [email protected] https://www.ietf.org/mailman/listinfo/grow
