On 22 Jun 2012, at 19:00, Enke Chen wrote:
> Hi, folks:
>
> It might help the discussion to refresh ourselves about several large outages
> in the last few years that prompted the work on the error handling
> requirements and solutions:
>
> o issue with AS4_PATH that resulted in session resets multiple hops away
> (two separate incidents)
> o session reset triggered by a single route with a new attribute
>
> I remember that Rob had a presentation at the NANOG on the topic.
Hi Enke,
Thanks for this message. This work is absolutely motivated by incidents in live
networks (both those that have publicly been described and those in private
network deployment that are not as public).
I have talked to this work a number of times across a number of operator forums:
- NANOG51 / LINX / UKNOF:
http://www.nanog.org/meetings/nanog51/presentations/Tuesday/shakir-bgp-error-handling_rob-shakir-FINAL2.pdf
http://www.nanog.org/meetings/nanog51/presentations/Tuesday/bgp_err_hdling.wmv
- Netnod:
http://rob.sh/files/RJS-Reinforcing_the_Kitchen_Sink-NETNOD-Autumn2011.pdf
The motivation for this work is improving the robustness of real-world
networks, where the current error handling behaviour does not match the
situation and deployment in which the protocol is deployed.
As I have voiced in other posts to both the IDR and GROW mailing lists, these
requirements put forward means to be able to balance the risk of things not
being 100% correct in terms of protocol operation against the threat of
complete outages for all routing information carried via a BGP session. If
these risks are not acceptable to an operator, then the solutions implemented
in answer to these requirements do not need to be enabled - however, at the
moment, the protocol is completely constrained in terms of its behaviour --
sessions are torn down regardless of the impact of their failure.
Kind regards,
r.
_______________________________________________
GROW mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/grow