Hi Maria, Hi Toke, On Tue, Feb 28, 2023 at 02:07:06PM +0100, Maria Matejka via Bird-users wrote: > > > I think it's probably simpler to just re-announce any route that's still > > > converging every time we go through the routing table. > > > > Simpler, yes, but I do want to be able to maintain internet scale routing > > tables through babel eventually so slashing every little bit helps :) > > In version 2, update of non-best route is propagated only to some protocols > like pipes, add-path BGPs and alike.
Ok, that's good for either approach. > In version 3, this is even more smoothed as all updates of one prefix are > exported asynchronously to each protocol, being notified after your Babel > ends the task (event, socket, timer), dampening best route oscillation or > other flaps. I don't quite understand why this would damp oscillations? Do you mean there's explicit route flap damping support in v3 or just that this is a side-effect of the new async world? I'd like to know more about either :) If we do have actual BGP style damping in nest in v3 I'm not sure there's much point in doing essentially the same thing in our proto. At the very least that would be a good reason to keep the babel specific daming easy to remove if it's about to become superceeded by direct nest support anyway. > This way, I'm not so scared about Babel periodically updating many routes. > BIRD has to withstand it. I still think doing uneccessary work/computation is just dumb if we can avoid it :) On Tue, Feb 28, 2023 at 04:45:35PM +0100, Toke Høiland-Jørgensen wrote: > Right, sure, that's a nice property, but I'm not actually sure how what > you sketched out above accomplishes that? Don't worry I'll send an RFC patch soon if I can make it work out, just got tied up by some (mild) covid. > > Simpler, yes, but I do want to be able to maintain internet scale routing > > tables through babel eventually so slashing every little bit helps :) > > Heh, yeah, I would like to eventually be able to do that as well, but I > think there are other optimisations we need to do first. For instance, > walking the entire routing table every second is not going to work in > the first place in this case :) True, but might as well throw myself at this RTT stuff while I have the time and energy. Large scale route table performance testing will have to wait for another day since there's not much point making it performant if the features I want/need aren't supported (and performant! haha). > >> Bear in mind that the currently selected route can also be converging, so > >> predicting when two routes "cross" gets complicated quickly. Simpler to > >> just do a periodic update and redo the comparison every time this update > >> happens. > > > > I feel like that's an artifact specific to the "metric smoothing" approach > > to route dampening not a feature though. The way I see it the behaviour we > > really want is to delay any change in selected route for a time related to > > the metric difference. > > > > Think back to what the purpose of the metric smoothing is in the first > > place: to limit oscillations of the selected route, which this will do just > > as well. > > I'm OK with finding another solution, but I think you're going to have > to explain in more detail how what you propose actually represents such > a solution, then :) Will do, I've been looking throug some network stability under dynamic routing literature to see if there's any well founded science we can apply here. Haven't really found anything good yet. The RTT paper does admit that "we lack an in-depth theoretical understanding of the performance of our algorithm, in particular of its stability." ;) There is one thing I'm unsure about: does the delay before propagating a route change to the kernel FIB actually have to depend on the metric difference to provide the network stability properties we're looking for? I think just strictly for stability a fixed delay should be fine, despite not being optimal in terms of convergence time. > > I don't agree with that. It's not as if I want per-hop information. Just a > > sum of RTTs along the path and a sum of administrative metric along the > > path rather than have those jumbled together into one number. > > > > Since babel is quite flexible in the actual metric math that would allow > > interesting ways of weighing each metric component rather than just having > > everything be linear. > > It also introduces dependencies, though. I.e., with the current approach > you can have a subset of the routers speak the RTT extension, and other > parts of the network will just have that incorporated into the metric. > Whereas if it is carried as a separate metric your entire network has to > know about the extension for it to be useful. I don't see why you couldn't do both. Incorporate the rtt (or other measures) into the metric for oblivious nodes and expose optional TLVs for ones that care about the different components. > > For debugging this would be useful as you can see that this path in front > > of you actually has a crazy RTT rather than someone just having fiddled > > with their rxcost. > > Meh, not convinced that the routing protocol is the right place to get > such debugging information. I'd rather just monitor the actual traffic :) I just put myself into the mind frame of "what if babel where used on the internet instead of eBGP" and how that means you'd have to convince lazy admins to run some weird additional software on their nodes or black box vendors not cooperating because they want to sell everyone their full network observability platform instead. Seems preferable to just have some more debuggability right in the protocol instead, no? If you're getting at the fact that you'd just do some passive TCP header sniffing do consider what happens when QUIC is widely deployed and that gets a whole lot harder :P > > yikes. Don't want to go down that road, got enough of these lookups in > > rt_notify already :) > > Right, but then we do need to put the smoothed metric into an attribute > if it's to be used in the comparison. But maybe you can explain how > that's not really need cf the above. Right and my first attempt was doing that, before I came up with the new approach. --Daniel
