Re: [Babel-users] Babel with VPP

Pim van Pelt Fri, 15 Mar 2024 11:43:02 -0700

Hoi colleagues,

Juliusz, feel free to use my site as a backref for your research group.I will probably write one more article, once the in flight changes havebeen merged, and I roll out Babel with VPP in production [fingerscrossed!]. I'll update the list once I do.

Thank you very much for engaging with me! I'll collate a few answers inone reply.


On 11.03.24 12:02, Juliusz Chroboczek wrote:

I find it interesting that you find Babel useful on a carrier network.
Since I've been working under the impression that IS-IS is absolutely
perfect (TM) in carrier networks, we have no experience whatsoever with
that case.

Perhaps I should clarify. My network AS8298 is built using point topoint carrier ethernet links provided by AS25091. They have physical(DWDM, dark fiber, etc) links, and use OSPF and LDP to signal ethernetpseudowires for me. If there is an inter-city link that fails, theirOSPF will reroute, and my traffic will go via a different path.

The problem is this: AS8298's IGP typically doesn't notice this. We runBFD with reasonably lax timeout of 3.0s and the convergence of theunderlying network is pretty quick. Also, if a backbone link at AS25091goes down, that typically means nothing for my ethernet link to theirrouters -- in other words: my IGP stays up, fully converged, but onelink all of the sudden goes from 5ms (frankfurt-zurich) to 35ms(frankfurt-amsterdam-paris,geneva,zurich).

I think Babel, for me at AS8298, will address this issue, and movetraffic away from this now-high-latency link.


On 11.03.24 12:51, Daniel Gröber wrote:

1) Is there any advice you could offer for rtt cost/min/max/decay values
when using Bird2 ?

The defaults should be fine honestly. While I have recommended changing
them in the past I fear I don't (yet) fully understand the impact of that
change on stability so best to leave them as is for now.

If I go this route, what I think I will need to know is the normallyexpected city-to-city latency (using my AS25091 provided point to pointethernet VPWS), and the alternate latency (when AS25091 needs toreroute), and force a higher cost when this is the case. I realize agoal ought to be minimizing the changes of the costs and topology,because on top of the IGP I will have a full table at 950K/200Kprefixes, as these routers are in the DFZ. One cool thing is that VPPwill consume a full table in about 7 seconds, including programming the FIB.

1) Bird's proto/babel doesn't have good policy controls right now. Ff you
need any sort of control over your IGP announcements for TE or what have
you things might get tricky. I do have a patch ready to begin fixing that
but unfortunately it's in limbo until BIRD v3 shakes out or we find some
funding/motivation to push a port to v3 forward.

Understood. Luckily, I don't do traffic engineering with OSPF currently,and I'm OK leaving that off for now.

Babeld does have (most of) the knobs I think you'd need but it's just not
suitable for 24/7 operation outside of toy networks without major rework
(sorry Juliusz!).

I think I will only be using Bird2 at AS8298. It is a production networkafter all :)

2) When a prefix is no longer reachable babel will insert an unreachable
route for it until some timeout expires. I don't recall the details off
hand but I'm sure Juliusz will jump in here ;)

That's acceptable for VPP. It picks up unreachable (and blackhole)routes and programs them correctly in the FIB. Thank you for pointing itout though!


On 11.03.24 23:09, Juliusz Chroboczek wrote:

1) Is there any advice you could offer for rtt cost/min/max/decay values when
using Bird2 ?

1. RTT-MIN

In the ideal case, your network consists of a number of interconnected
clusters.  For example, if you have routers in Berlin, Paris and Warsaw,
then each of the cities constitutes a cluster.  Within each cluster, it
doesn't make sense to chose routes based on RTT, since small RTT values
tend to be noisy and cause instability.

Agreed. Within the metro, reroutes are "free of charge" latency wise.

Rtt-min should ba a value that is more than the intra-cluster latency but
less than the inter-cluster latency.  For example, if latency within each
cluster is on the order of 5ms, and the inter-cluster latency is 20ms,
then the deault value of 10ms is fine.

I think this is what I'm looking for. Once the latency from ZRH-FRA isestablished at 5ms, but link failure drives that up to 30ms, I can playwith a rtt-min of >>5ms (to account for jitter and variance) but <30ms,so that the cost raises only when strictly necessary.

Large values of rtt-min improve stability in the presence of bufferbloat.


2. RTT-MAX

Symmetrically to rtt-min, rtt-max is the value above which links are
considered bad.  It should be slightly larger than the largest RTT in your
network.  Set it as small as possible in your network, since it has
a dramatic effect on stability in the presence of bufferbloat.

The default is 120ms, which is very conservative, but already has
a big effect on improving stability in bufferbloated networks.

I think for most all pairs of router adjacencies, 120ms will rarely ifever be reached. To confirm though, I am free to take different valuesof rtt-min and rtt-max per interface, right? I have a router inCalifornia at 150ms normal latency, so here rtt-min would be 170 andrtt-max might want to be 250ms or something higher like that.

3. MAX-RTT-PENALTY (rtt cost in BIRD)

This is the maximum cost penalty that will be applied to high-RTT links.
The default (96) is rather conservative, it will cause one high-RTT link
to be equivalent to two low-RTT links.

Perhaps you can confirm my understanding. Considering a ring of routersZRH-FRA-AMS-LIL-PAR-GVA-{ZRH}, any given link here will force traffic togo the other direction. So say ZRH-FRA fails, the traffic that used tobe cost 96 would now become larger than 5x96 for ZRH-GVA-PAR-LIL-AMS-FRAto win. So I think my max penalty cost should be 500 so that the ZRH-FRAlink will lose out on the alternate of 480. Is that how you see it as well?


On 11.03.24 13:26, Dave Taht wrote:

He also did a nice writeup of an inexpensive 32x100Gbit switch
recently, running... debian.

https://www.linkedin.com/pulse/debian-mellanox-100g-switch-pim-van-pelt-3pivf/

Thank you for the plug, Dave :) If you're not on Linkedin, that article(and others) is primarily published on:

https://ipng.ch/s/articles/2023/11/11/mellanox-sn2700.html

That particular one was fun for me because I really just read their owndocs and published my findings after playing around with the switch andDebian and Switchdev, but they were suitably enamored that they gave mea call to meet with the silicon, ethernet, and switchdev teams :-)


And if you're still reading, an update from me:

- I had a conversation with the VPP developers to accept two patches toVPP, one of which allows the original premise of my article (ipv4-lesstransit networks, using loopback /32 only) to work.- The other is to enable/allow point to point links over ethernet, toreply to ARP (which is what I see most platforms I am familiar withalready do). It sparked a bit of discussion, but I'm hopeful that it canbe merged.- I also toyed a bit more with IPv4-less OSPFv2 (not quite there yetwith VPP), but since that's probably more a topic for bird-users, I'llspare you the gory details.



groet,
Pim


--

Pim van Pelt<[email protected]>
PBVP1-RIPE -https://ipng.ch/

_______________________________________________
Babel-users mailing list
[email protected]
https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] Babel with VPP

Reply via email to