Hi, folks:

It might help the discussion to refresh ourselves about several large outages in the last few years that prompted the work on the error handling requirements and solutions:

o issue with AS4_PATH that resulted in session resets multiple hops away (two separate incidents)
   o session reset triggered by a single route with a new attribute

I remember that Rob had a presentation at the NANOG on the topic.

-- Enke

On 6/22/12 8:57 AM, Robert Raszuk wrote:
Jim,

We could as easily without any change to BGP use BGP Persistence to
maintain the paths except for the ones that have the invalid
attribute.. This is the simpler method, has the benefit of not
changing BGP, or educating the world on the nuances of the changes
etc...
+
Why wouldn't we simply let the session fail and then use BGP Persistence
or GR ;)

Please observe that when the session is down you are not receiving withdraws or new best paths for those "good" prefixes (maybe 99% of them) which did not have any errors in their respective update messages.

Equating it with persistence proposal is therefor highly incorrect.

I also do not fully understand "treat as withdraw" does this meant that
the peer who has received an update with P1-PN with malformed attr then
initiate a withdrawal to all of its peers?  Or simply assume that the
paths have been received as a message?  Some sample topologies as to how
this works would be a good addition to this section..

The speaker reacting on an error which can be addressed by "treat-as-withdraw" invalidates locally those prefixes received in the update message, runs local best path and as result if no other path is found withdraws those prefixes from all peers it has previously sent them to.

I am not in support of solutions which create a scenario where BGP
cannot recover without human intervention.

I think no one is. But we are - I think - not there yet for the routers to automatically fix their bugs, but only automatically signalling them the requested action ;(.

> Nothing is going to get people's attention like a failed BGP
> Session..

True statement. But the entire assumption behind treat-as-withdraw is that your ops scripts parse the syslog messages indicating the issue to NOC with the same red color and buzz as bgp session down. Of course you need to rework your ops scripts/alarms for that to happen.

Rgs,
R.

PS.

Note that if the main BGP session is down (like in the persistence case) BGP Operational Messages can not any longer be exchanged between peers as TCP connection could have been reset (if no multisession is used and if we are talking about single SAFI). That just makes the issue worse especially when you do not like to have humans intervention.




_______________________________________________
Idr mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/idr

_______________________________________________
GROW mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/grow

Reply via email to