+1.. Jim Uttaro
From: Enke Chen [mailto:[email protected]] Sent: Friday, June 22, 2012 2:00 PM To: [email protected] Cc: [email protected] List; [email protected]; UTTARO, JAMES; Enke Chen Subject: Re: [Idr] Fwd: [GROW] draft-ietf-grow-ops-reqs-for-bgp-error-handling-04 Hi, folks: It might help the discussion to refresh ourselves about several large outages in the last few years that prompted the work on the error handling requirements and solutions: o issue with AS4_PATH that resulted in session resets multiple hops away (two separate incidents) o session reset triggered by a single route with a new attribute I remember that Rob had a presentation at the NANOG on the topic. -- Enke On 6/22/12 8:57 AM, Robert Raszuk wrote: Jim, We could as easily without any change to BGP use BGP Persistence to maintain the paths except for the ones that have the invalid attribute.. This is the simpler method, has the benefit of not changing BGP, or educating the world on the nuances of the changes etc... + Why wouldn't we simply let the session fail and then use BGP Persistence or GR ;) Please observe that when the session is down you are not receiving withdraws or new best paths for those "good" prefixes (maybe 99% of them) which did not have any errors in their respective update messages. Equating it with persistence proposal is therefor highly incorrect. I also do not fully understand "treat as withdraw" does this meant that the peer who has received an update with P1-PN with malformed attr then initiate a withdrawal to all of its peers? Or simply assume that the paths have been received as a message? Some sample topologies as to how this works would be a good addition to this section.. The speaker reacting on an error which can be addressed by "treat-as-withdraw" invalidates locally those prefixes received in the update message, runs local best path and as result if no other path is found withdraws those prefixes from all peers it has previously sent them to. I am not in support of solutions which create a scenario where BGP cannot recover without human intervention. I think no one is. But we are - I think - not there yet for the routers to automatically fix their bugs, but only automatically signalling them the requested action ;(. > Nothing is going to get people's attention like a failed BGP > Session.. True statement. But the entire assumption behind treat-as-withdraw is that your ops scripts parse the syslog messages indicating the issue to NOC with the same red color and buzz as bgp session down. Of course you need to rework your ops scripts/alarms for that to happen. Rgs, R. PS. Note that if the main BGP session is down (like in the persistence case) BGP Operational Messages can not any longer be exchanged between peers as TCP connection could have been reset (if no multisession is used and if we are talking about single SAFI). That just makes the issue worse especially when you do not like to have humans intervention. _______________________________________________ Idr mailing list [email protected]<mailto:[email protected]> https://www.ietf.org/mailman/listinfo/idr
_______________________________________________ GROW mailing list [email protected] https://www.ietf.org/mailman/listinfo/grow
