+1..

Jim Uttaro

From: Enke Chen [mailto:[email protected]]
Sent: Friday, June 22, 2012 2:00 PM
To: [email protected]
Cc: [email protected] List; [email protected]; UTTARO, JAMES; Enke Chen
Subject: Re: [Idr] Fwd: [GROW] 
draft-ietf-grow-ops-reqs-for-bgp-error-handling-04

Hi, folks:

It might help the discussion to refresh ourselves about several large outages 
in the last few years that prompted the work on the error handling requirements 
and solutions:

   o issue with AS4_PATH that resulted in session resets multiple hops away 
(two separate incidents)
   o session reset triggered by a single route with a new attribute

I remember that Rob had a presentation at the NANOG on the topic.

-- Enke

On 6/22/12 8:57 AM, Robert Raszuk wrote:
Jim,


We could as easily without any change to BGP use BGP Persistence to
maintain the paths except for the ones that have the invalid
attribute.. This is the simpler method, has the benefit of not
changing BGP, or educating the world on the nuances of the changes
etc...
+

Why wouldn't we simply let the session fail and then use BGP Persistence
or GR ;)

Please observe that when the session is down you are not receiving withdraws or 
new best paths for those "good" prefixes (maybe 99% of them) which did not have 
any errors in their respective update messages.

Equating it with persistence proposal is therefor highly incorrect.


I also do not fully understand "treat as withdraw" does this meant that
the peer who has received an update with P1-PN with malformed attr then
initiate a withdrawal to all of its peers?  Or simply assume that the
paths have been received as a message?  Some sample topologies as to how
this works would be a good addition to this section..

The speaker reacting on an error which can be addressed by "treat-as-withdraw" 
invalidates locally those prefixes received in the update message, runs local 
best path and as result if no other path is found withdraws those prefixes from 
all peers it has previously sent them to.


I am not in support of solutions which create a scenario where BGP
cannot recover without human intervention.

I think no one is. But we are - I think - not there yet for the routers to 
automatically fix their bugs, but only automatically signalling them the 
requested action ;(.

> Nothing is going to get people's attention like a failed BGP
> Session..

True statement. But the entire assumption behind treat-as-withdraw is that your 
ops scripts parse the syslog messages indicating the issue to NOC with the same 
red color and buzz as bgp session down. Of course you need to rework your ops 
scripts/alarms for that to happen.

Rgs,
R.

PS.

Note that if the main BGP session is down (like in the persistence case) BGP 
Operational Messages can not any longer be exchanged between peers as TCP 
connection could have been reset (if no multisession is used and if we are 
talking about single SAFI). That just makes the issue worse especially when you 
do not like to have humans intervention.






_______________________________________________

Idr mailing list

[email protected]<mailto:[email protected]>

https://www.ietf.org/mailman/listinfo/idr

_______________________________________________
GROW mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/grow

Reply via email to