Hi Babel community,

As I mentioned in another thread, I am curious about whether Babeld
can be adapted to work with global full-table.

No.

There is one long standing issue with merging from the kernel table
that would benefit from a qsort.

But you are going to

A) run out of bandwidth - 785k routes  = ~14,000 babel packets. I
think that rounds to 1280 bytes/packet,
   so - and babel will want to announce these every 4 seconds - so
call it 44mbits/sec? (feel free to check my math, it's friday). That's
well above what I've ever seen wifi mcast in particular, achieve. And
that's *per router*.

To get there, the announcement interval would have to be increased up
to at least a typical bgp interval (2 minutes) and even then...

B) you run out of cpu - babeld uses linked lists, and tries to recalc
bellman-ford every 4 seconds also. There's a need for a faster, safer
kernel interface.
I do not see a reason why we could not change the data structures to consume less CPU under the given scenario.

my rtod tests showed babeld typically falling over for any one of
these four reasons in well under 4k routes on low end mips and arm
hardware. Even the low end apu2 eats a whole cpu with about that many
(ipv6) routes.
What are the other two reasons?

I made a few sloppy computational improvements and so on while
developing the rtod test. Tried to upstraeam a few,  my then-current
employer wasn't happy with me working under anything but the apache
license and didn't care, and I ran out of time and energy and have to
admit I was hacking far more than programming -

I think making some version of babel (be it bird or frr ) scale well
to at least 64k routes would be a very good idea,
I agree.

and once things now entering it like unicast, and crypto, are stable,
it would be a GREAT thing to have a version that did that, but I fear
it will involve parallizing hellos and bellman ford and per interface
threads, changes to the protocol to adapt the interval to the bandwidth
and cpu available, tcp friendly rate control (or swapping routes via
tcp), etc, etc.
I agree, as routers become more powerful and even low-end devices are emerging that feature multiple CPU cores, there might be a benefit. On another note - just parallelizing any algorithm (not specific to babeld) will only get you so far. The algorithm/data structures should be optimized first.


and a whole suite of other cool things that nobody has time, energy,
or sufficient programmers for. And it wouldn't be babeld anymore.

Bird's version of babel should perform mildly better, as it has
tighter code (xor rather than memcmp in one case I tried to upstream),
and a few other better algorithms overall, but I suspect few besides
me and john ( http://the-edge.taht.net/post/gilmores_list/ ) care
enough about city-scale routing to get anywhere.
You are forgetting the Freifunk Communities in Germany. This is what they do: building city-wide wifi mesh networks. Currently mostly with segmented batman. Now that my patchset for babeld integration has been merged in gluon (the framework which most communities use to build their networks) the babeld technology is available to a wider audience for their meshes. To get an impression on the size of the community, https://www.freifunk-karte.de/ might be an interesting start. Agreed, the development community is much smaller - still I can see dozen or so people contributing to gluon. It certainly could be worse.


I should probably try to extract more patches from my misguided
efforts, like this:

https://github.com/dtaht/rabeld/commit/b74b4a6f9b532717ee93346963efd894e94615b3

and I had a bpf filter that helped a lot, and I sunk time into
enabling sse and neon ins...

but I was mostly hoping the unicast/crypto/etc stuff would land in one
piece I could do all up testing on before tackling the scaling
problems, on someone elses time. I ended up deciding that I wanted to
rewrite it all from scratch, hit licensing and employer problems...
and time.....
I would appreciate that. We are just starting another test network for a city-wide mesh which will be based on babeld. Links that do not have wifi connectivity will be using wireguard as vpn. There is a significant speed improvement of that tech stack over batman+fastd. Let's see how much pull this gets. In any case: 64K routes should be working on current cheap routers in a network like that.

One of my environments uses BGP full-table from 3 upstream ISPs (each
with 785k routes currently).
  +----------+  +----------+  +----------+
  |Customer A|  |Customer B|  |Customer C|
  +----+-----+  +----+-----+  +----+-----+

  +----+----+    +---+---+   +-----+-----+
  |Edge Asia|----|Edge US|---|Edge Europe|
  ++-------++    +---+---+   +-----------+

+--+--+ +--+--+   +--+--+
|ISP A| |ISP B|   |ISP C|
+-----+ +-----+   +-----+
Babeld would simply refuse to run on this environment, blocking the
whole network without converging, with 100% CPU utilization.

Don't do that. It hurts when you do that. :) rtod is a way to get to
overload more gently.
What would have to be done to get confirmation of Juliusz' theory wrt the source of the load?

Christof

--
()  ascii ribbon campaign - against html e-mail
/\  against proprietary attachments

Attachment: signature.asc
Description: PGP signature

_______________________________________________
Babel-users mailing list
[email protected]
https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users

Reply via email to