Hi Babel community,
I do not see a reason why we could not change the data structures to consume less CPU under the given scenario.As I mentioned in another thread, I am curious about whether Babeld can be adapted to work with global full-table.No. There is one long standing issue with merging from the kernel table that would benefit from a qsort. But you are going to A) run out of bandwidth - 785k routes = ~14,000 babel packets. I think that rounds to 1280 bytes/packet, so - and babel will want to announce these every 4 seconds - so call it 44mbits/sec? (feel free to check my math, it's friday). That's well above what I've ever seen wifi mcast in particular, achieve. And that's *per router*. To get there, the announcement interval would have to be increased up to at least a typical bgp interval (2 minutes) and even then... B) you run out of cpu - babeld uses linked lists, and tries to recalc bellman-ford every 4 seconds also. There's a need for a faster, safer kernel interface.
my rtod tests showed babeld typically falling over for any one of these four reasons in well under 4k routes on low end mips and arm hardware. Even the low end apu2 eats a whole cpu with about that many (ipv6) routes.
What are the other two reasons?
I made a few sloppy computational improvements and so on while developing the rtod test. Tried to upstraeam a few, my then-current employer wasn't happy with me working under anything but the apache license and didn't care, and I ran out of time and energy and have to admit I was hacking far more than programming - I think making some version of babel (be it bird or frr ) scale wellto at least 64k routes would be a very good idea,
I agree.
I agree, as routers become more powerful and even low-end devices are emerging that feature multiple CPU cores, there might be a benefit. On another note - just parallelizing any algorithm (not specific to babeld) will only get you so far. The algorithm/data structures should be optimized first.and once things now entering it like unicast, and crypto, are stable, it would be a GREAT thing to have a version that did that, but I fear it will involve parallizing hellos and bellman ford and per interface threads, changes to the protocol to adapt the interval to the bandwidth and cpu available, tcp friendly rate control (or swapping routes via tcp), etc, etc.
You are forgetting the Freifunk Communities in Germany. This is what they do: building city-wide wifi mesh networks. Currently mostly with segmented batman. Now that my patchset for babeld integration has been merged in gluon (the framework which most communities use to build their networks) the babeld technology is available to a wider audience for their meshes. To get an impression on the size of the community, https://www.freifunk-karte.de/ might be an interesting start. Agreed, the development community is much smaller - still I can see dozen or so people contributing to gluon. It certainly could be worse.and a whole suite of other cool things that nobody has time, energy, or sufficient programmers for. And it wouldn't be babeld anymore. Bird's version of babel should perform mildly better, as it has tighter code (xor rather than memcmp in one case I tried to upstream), and a few other better algorithms overall, but I suspect few besides me and john ( http://the-edge.taht.net/post/gilmores_list/ ) care enough about city-scale routing to get anywhere.
I would appreciate that. We are just starting another test network for a city-wide mesh which will be based on babeld. Links that do not have wifi connectivity will be using wireguard as vpn. There is a significant speed improvement of that tech stack over batman+fastd. Let's see how much pull this gets. In any case: 64K routes should be working on current cheap routers in a network like that.I should probably try to extract more patches from my misguided efforts, like this: https://github.com/dtaht/rabeld/commit/b74b4a6f9b532717ee93346963efd894e94615b3 and I had a bpf filter that helped a lot, and I sunk time into enabling sse and neon ins... but I was mostly hoping the unicast/crypto/etc stuff would land in one piece I could do all up testing on before tackling the scaling problems, on someone elses time. I ended up deciding that I wanted to rewrite it all from scratch, hit licensing and employer problems... and time.....
One of my environments uses BGP full-table from 3 upstream ISPs (each with 785k routes currently). +----------+ +----------+ +----------+ |Customer A| |Customer B| |Customer C| +----+-----+ +----+-----+ +----+-----+
+----+----+ +---+---+ +-----+-----+ |Edge Asia|----|Edge US|---|Edge Europe| ++-------++ +---+---+ +-----------+
What would have to be done to get confirmation of Juliusz' theory wrt the source of the load?+--+--+ +--+--+ +--+--+ |ISP A| |ISP B| |ISP C| +-----+ +-----+ +-----+ Babeld would simply refuse to run on this environment, blocking the whole network without converging, with 100% CPU utilization.Don't do that. It hurts when you do that. :) rtod is a way to get to overload more gently.
Christof -- () ascii ribbon campaign - against html e-mail /\ against proprietary attachments
signature.asc
Description: PGP signature
_______________________________________________ Babel-users mailing list [email protected] https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users
