you can easily introduce all sorts of issues in any routing daemon using the rtod tool. ( https://github.com/dtaht/rtod )
I've certainly hit this one. I really wish more folk tested meshy routing daemons for 4k+ routes or more. One cure is to basically leverage the same qsort technique added elsewhere in babel in this component. You can look at the git diff from each addition of qsort to grok what to change. Big help. (I note I also then saw some benefit in using an inline qsort, which while ugly and using a macro, let you stick the base compared (ipv6) value in registers, which helped a lot. I have some code for this lying around somewhere) Another thought (resend.c has issues too, which is not easily resolved by qsort), was to switch to using hashing throughout. I gave uthash (my most commonly used C hashing lib) a shot on resend.c, which worked pretty good, but I felt the overhead was too high and started to work with the less loved but lighter hhash instead, then dug a hole for myself in wanting to use "route tags" instead of full blown ip and ipv6 addressess everywhere, then timerwheels, then gave up. There's new support for switching nexthops in modern kernels worth leveraging. Threads might help... I have a couple days a year to muck with babel, tops. I was hoping someone would be inspired to do a rust version, because that's the only thing I think could be competetive with the C version, easier to extend, and could perhaps attract funding. The go version stalled out, at least in part, because at the time the kernel netlink interface for go, sucked. Go is finally getting smaller shared libs but I figure the GC will suck far worse than the C version does... and has anyone ever played with librcu?? I'd set a goal for myself a few years back of 64k routes (city scale networking: http://the-edge.taht.net/post/gilmores_list/ )), and then started running into congestion control issues... There's an awful lot we could do to make babeld awesome, just ENOTIME. On Sun, May 31, 2020 at 5:04 PM Fabian Bläse <[email protected]> wrote: > > Hi Johannes, > > thanks again for the analysis! > I made the mistake of not inspecting kinstall_route closer. For some reason I > thought that this only does the actual netlink communication. > > Your guess actually could explain the behaviour I've seen very well. > Installing routes takes a very long time, but only if all routes are > installed already. > Therefore it is relatively easy for the node to initially connect to the > network, because there are less routes to compare to, when they are received > and installed for the first time. > > As I've already said it might be possible that I've mixed up babeld version > numbers, so I analyzed versions with a known issue. > So it is very possible that this issue does not originate from babeld-1.9.x, > but our network just got too big at a very unfortunate time. > > Regards, > Fabian > > _______________________________________________ > Babel-users mailing list > [email protected] > https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users -- "For a successful technology, reality must take precedence over public relations, for Mother Nature cannot be fooled" - Richard Feynman [email protected] <Dave Täht> CTO, TekLibre, LLC Tel: 1-831-435-0729 _______________________________________________ Babel-users mailing list [email protected] https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users
