On Jan 1, 2013, at 12:27 PM, Chris Hall wrote: > It is a truth universally acknowledged (AFAICS), that if NLRI in a > broken UPDATE are treated-as-withdraw, that is no worse than > session-reset and much to be preferred. > > So, treat-as-withdraw is a reasonable thing for any implementation to > do, by default, where it can. > > The problem is that when things are broken, it may not be possible to > identify all the NLRI -- some may be "lost". [At this point I > recommend: "The Engineer", AA Milne.]
I've been following this draft and wanted to chime in with some concerns. 1) in the (big) "Internet" space, lost NLRI are a huge deal for customers. 2) When we see a software defect that results in a session being closed, we regularly need help from the vendor to identify the offending NLRI/message. 3) The biggest problems we've seen are where vendor A and vendor B (or their sub-variants that run another OS) behave differently with the same NLRI. I've seen a small number of theses cases over the past ~12 years and am very concerned with the amount of effort trying to error correct the error handling system when one side has a software defect. I want to make it clear that all these cases attempting to resync at 0xffff etc are correcting for a defect. Some of these defects resulted in an improperly formatted UPDATE message from a sender, and others were the result of problems on the receiver. The instability this possibly introduces into an enterprise or large scale network is of significant concern. To solve the "treat as withdraw" (aka: possibly create a per-node routing loop) problem, I would expect BGP users to demand a "periodic resend all NLRIs" feature to flush the state, creating further entropy in the system unnecessarily. At some point, the equipment that is sending or receiving the BGP message will need to be maintained by the owner. The dropping of the session is meant to draw attention to the problem in the same way as others that use ABORT or ASSERT in their code to handle an unexpected condition. We can not correct for every error, nor should we make that a goal as the results in writing the error handling code quickly get complex. (Speaking as someone who attempted to write a BGP implementation once.. ugh). Asbestos suit on, - Jared _______________________________________________ GROW mailing list [email protected] https://www.ietf.org/mailman/listinfo/grow
