Camilo, On Fri, Mar 21, 2025 at 08:19:57PM -0500, Camilo Cardona wrote: > Yes, and not only for the backup paths, we also have options for marking > non-selected paths and their churn might be even worse.
I should have been more precise. When I typed "backup", it would have been better to say "non-active paths". > We know that this might be complicated for the devices. Section 3. explains > that the reason code should be optional, and devices should provide options > for enable or disable the reason code. Do you think there are other > implementation guidelines that we can consider that would facilitate this on > the devices? In addition to looking at the draft again as part of inbox cleanup, I spent a few moments of under-slept time while traveling home from IETF-122 pondering what my employer's implementation would need to do to support the draft. I suspect some many of my conclusions would apply to other implementations. Consider the case where the system is in initial route learning state. For simplicitly, we have N (N >= 2) feeds of the Internet. During route learning, it's possible for us to learn a route and cause the prior active route to lose best path status each time, resulting in N-1 changes. If we had a fast enough loc-rib feed, it would be possible to see this churn in many circumstances depending on the route properties. In practice, this churn doesn't make it all the way to a monitoring station due to state compression. Our implementation prioritizes advertisement of loc-rib after rib-in which further helps suppress the churn. With path marking on rib-in for the scenario above, we are not only having to report the newly learned route, but also eventually enqueue the route that just lost being best path to have the path-marking TLV. We are lucky in most circumstances that once a route has lost best path status, the reason why is likely to be fairly consistent. This means that for the above worst case, we're having to advertise O(2*N) rib-in messages rather than O(N) during learning. There's also a matter that if we were engaging in active path marking during route learning and passing the reason for the churn in actively to the feed that we may delay end-of-rib status for the learned routes. That's perhaps problematic. So, what if the marking happened somewhat later to avoid churning the system. What could this look like? One answer would be that the inactive paths have their rib-in entries re-queued for bmp advertisement with the path marking TLV. If N-1 paths are re-sent for the entire rib-in, that's still substantial traffic. An interesting related question is what time-stamp should you use in the per-peer header? RFC 7854 suggests it's the route-learning time. As we discussed during grow in IETF 122 one's "faith" in timestamp accuracy may be low but perhaps mostly good enough for rib-in for most implementations. Very likely we'd like to use the original learning time-stamp in such a "path-marking status only update". That could permit receiving stations to avoid treating this metadata update as actual path churn. For the hypothetical situation above, the active path may churn much less than N-1 times. So, perhaps this doesn't appear as problematic as it could? A second scenario for consideration is a change of policy or even IGP cost due to IGP churn. For a change in policy impacting the rib-in (import policy), this means that not only is the rib-in-post for the impacted peer that has had its policy changed, but also other rib-in views to reflect a potentially new reason the route is now inactive. For IGP cost, this isn't previously reflected in the rib-in-post view. However, now such a change may directly manifest in the status changes. ----- Overall, for a few simple situations it seems like the overall churn in BMP is significantly higher. I'm not going to press the point that it is impossible to reflect these things in the protocol. What I'm curious about is whether users of BMP find this churn problematic or not? Do the authors have implementations for this feature that they can share their observations about these potential extension consequences and how they've mitigated them? -- Jeff _______________________________________________ GROW mailing list -- [email protected] To unsubscribe send an email to [email protected]
