Re: [Babel-users] Bucket full, dropping packet
>> Matthieu, do you understand why that is? Is there a way to optimise away >> conflict_solution in the easy case? > > I think so. Will fix it. The attached patch should solve the problem. As a conflict need a specific route, the first now loop iterates on specific routes only. If there is no such route, the function exits immediately. Otherwise, we need to traverse the RIB for each specific route. Further improvements may be possible (avoid the malloc/free call), but I think this should be sufficient. It has currently been tested on our testbed network, and doesn't seem to crash the network. Though, some review may be valuable before real deployment. Matthieu 0001-Optimization-by-allowing-traversing-installed-specif.patch Description: Binary data ___ Babel-users mailing list Babel-users@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users
Re: [Babel-users] Bucket full, dropping packet
> The attached patch should solve the problem. Nice, applied. > +if(!check_specific_first()) > +fprintf(stderr, "Invariant failed: specific routes first in RIB.\n"); #ifdef DEBUG, or if(debug_level >= 2) ? -- Juliusz ___ Babel-users mailing list Babel-users@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users
Re: [Babel-users] Bucket full, dropping packet
On Mon, Dec 14, 2015 at 8:15 PM, Dave Tahtwrote: > Is there a reliable way of determining that an underlying interface is a > bridge? A local bridge/wifi? Sure... just look into the source of the brctl tool... it can give you a list of all bridge interfaces. A "wifi bridge" in an external device? That would be a job for DLEP... ;) Henning ___ Babel-users mailing list Babel-users@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users
Re: [Babel-users] Bucket full, dropping packet
> #ifdef DEBUG, or if(debug_level >= 2) ? Well, I was not sure about this one. The problem with debug_level is that it produces too verbose output, it's not just "checks". I was rather thinking about having a test-mode version of babeld, which let the clean babeld-code as-is, and add some code around, like mocks. (?) Matthieu ___ Babel-users mailing list Babel-users@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users
Re: [Babel-users] Bucket full, dropping packet
Hi! But it will be more complicated with upgrades, and more complicated assuring that things are really the same (same patched version of Babel, same kernel version, TCP/IP stack, sysctl settings, etc.). Mitar On Sun, Dec 13, 2015 at 1:00 PM, Juliusz Chroboczekwrote: >>> Ok, I'll see on Monday if I can get an extra VM before Christmas. >> >> Which VM system are you using? We might be able to generate you a >> ready-made image. > > Please don't -- I'll let our system administrators clone their usual > VMWare image, it's better for everyone if I use what they're familiar > with. > > -- Juliusz -- http://mitar.tnode.com/ https://twitter.com/mitar_m ___ Babel-users mailing list Babel-users@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users
Re: [Babel-users] Bucket full, dropping packet
> Matthieu, do you understand why that is? Is there a way to optimise away > conflict_solution in the easy case? I think so. Will fix it. I may call you before. Matthieu ___ Babel-users mailing list Babel-users@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users
Re: [Babel-users] Bucket full, dropping packet
On Sun, Dec 13, 2015 at 2:52 PM, Juliusz Chroboczekwrote: >> Ok, I can do some profiling on the babeld that is running on the VPN >> server with the large number of links. Just tell me what profiling data >> do you want? Should I just compile a debug build and run babeld through >> callgrind or do you have something else in mind? > > I'm not familiar with callgrind -- I've had both results with both "perf > record" and gprof. But yes, callgrind should be fine. > > I need to find out where the CPU time is going. I suspect either the > quadratic loop in xroute.c, or linear-time route selection in route.c. > I intend to fix both, but I'd like to be sure. > >> Yes, you only need to establish a VPN connection to our server using >> tunneldigger-client [1] (it compiles on Debian) and run babeld on the >> VPN interface. We only need to allocate an IPv4 address for you so there >> will be no conflicts. > > Ok, I'll see on Monday if I can get an extra VM before Christmas. I have a half dozen machines all over the world, courtesy of linode. Can spin a new one up for you in a matter of minutes. > > -- Juliusz > > ___ > Babel-users mailing list > Babel-users@lists.alioth.debian.org > http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users ___ Babel-users mailing list Babel-users@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users
Re: [Babel-users] Bucket full, dropping packet
On 13 December 2015 at 22:00, Juliusz Chroboczek < j...@pps.univ-paris-diderot.fr> wrote: > >> Ok, I'll see on Monday if I can get an extra VM before Christmas. > > > > Which VM system are you using? We might be able to generate you a > > ready-made image. > > Please don't -- I'll let our system administrators clone their usual > VMWare image, it's better for everyone if I use what they're familiar > with. > Hi Juliusz and all, I have VPS running babel and tunneldigger which is part of Wlan Slovenia network. If it helps I can give you, and anybody else who can troubleshoot, instant root access to this server, just provide me with your public ssh keys. Thanks, Valent. ___ Babel-users mailing list Babel-users@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users
Re: [Babel-users] Bucket full, dropping packet
>> Ok, I'll see on Monday if I can get an extra VM before Christmas. > > Which VM system are you using? We might be able to generate you a > ready-made image. Please don't -- I'll let our system administrators clone their usual VMWare image, it's better for everyone if I use what they're familiar with. -- Juliusz ___ Babel-users mailing list Babel-users@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users
Re: [Babel-users] Bucket full, dropping packet
>> Oh, sadly not 75 max. The number of neighbours of one node is >> potentially very large, because all nodes with Internet uplink >> connects to the VPN server. So, all those nodes are then neighbours of >> that VPN server node. Currently this is for example 140 nodes on one > So yes, 75 is the current number of Babel peers, but there are otherwise > many more peers and there will be even more. How does babeld scale in > this case (BTW, is this a problem with the protocol or the implementation)? The implementation. The protocol should be able to scale very well. Look at the code at xroute.c line 304 -- this is quadratic in the number of the routes in the kernel, and is done every time the kernel tells us the set of routes has changed. This is easily fixed (the kernel provides us with deltas, we don't need to take a complete dump every time), but I need some profiling data before I spend time on that. Another issue is that a Babel node keeps a redundant routing table -- it keeps routes that are not installed so it can recover from a failure immediately. The current implementation doesn't limit the amount of redundancy -- it will keep up to (v*n) routes. The obvious solution is to be just 3-redundant -- keep just the best 3 routes to a given destination. Obviously, this means that route selection will be suboptimal in some cases, and that recovery from loss will be slower, but something's got to give. Mitar, Jernej -- this is exciting stuff, and I'm pretty sure we'll be able to solve it. > But I am thinking that we do need some faster way to debug these > issues that this ping-pong over the mailing list, with what should we > run and then file back and so on. What about you getting one cheap > TPLink, go to https://nodes.wlan-si.net/, register a node and deploy > it? Very good idea. Is it possible to use a Debian box? With all the profiling and debugging tools, it would be more useful. -- Juliusz ___ Babel-users mailing list Babel-users@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users
Re: [Babel-users] Bucket full, dropping packet
Hello! On 08. 12. 2015 17:58, Jernej Kos wrote: > We are still on the lookout for unparsable packets ;-) Got one! Couldn't parse packet (8, 14) from fe80::2ff:1bff:fe10:3d34 on digger1438. Packet dump: 08 0e 01 00 20 00 06 40 32 8e ff ff 0a fe 00 08 Jernej signature.asc Description: OpenPGP digital signature ___ Babel-users mailing list Babel-users@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users
Re: [Babel-users] Bucket full, dropping packet
Hi! Hm, I thought that Babel was tested on large networks and that it was tested on simulated large networks? Or are we now the largest network using it and this is why we are getting in all this trouble? So this is just another academic project which looks good on the paper but in practice it is not really production grade? We had to turn of Babel in the network and go back to OLSRv1. So much for smooth transition. Mitar On Fri, Dec 11, 2015 at 10:29 AM, Jernej Koswrote: > Hello! > > On 11. 12. 2015 18:47, Matthieu Boutier wrote: >> Did you know where does this version comes from? Is there a packet's >> version, or whatever? > > I will add some code that dumps the whole packet. > >> It's really strange that an "Update" message could fail. After a >> quick look at the code, I think it should fail at line 513. Could >> you see if changing > > I will try. > > > Jernej > > > ___ > Babel-users mailing list > Babel-users@lists.alioth.debian.org > http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users -- http://mitar.tnode.com/ https://twitter.com/mitar_m ___ Babel-users mailing list Babel-users@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users
Re: [Babel-users] Bucket full, dropping packet
> Hm, I thought that Babel was tested on large networks and that it was > tested on simulated large networks? Babel has been tested in one large network (1500 nodes). It took 20 minutes to converge, and worked fine after convergence time. I fixed the initial convergence issues (too many broadcasts at boot), but didn't receive any more feedback from the company who had done the test (I didn't want to sign any non-disclosure agreements). The amount of state that a Babel node maintains is proportional to v*r, where v is the number of neighbours and r the number of routes. Your network is somewhat unusual in that it has some very central nodes -- 75 neighbours max, I believe --, which is something that Babel doesn't like very much. The protocol should be able to deal with that (75 * 500 is less than 4), but the implementation will likely need some tuning. I'm hoping that you can help me do the tuning. > Or are we now the largest network using it and this is why we are > getting in all this trouble? You are the largest Babel network right now. I'm very excited about your deployment, and I'm looking forward to tuning the babeld implementation to work well enough for your needs. > So this is just another academic project which looks good on the paper > but in practice it is not really production grade? Most academic projects produce no useful software, just simulation. We are doing our best to provide production-quality software, and as a matter of fact babeld is running right now in a production network of 200 nodes. However, Nexedi's network has been designed with Babel in mind, and it doesn't have any central nodes -- all nodes have roughly the same number of neighbours. > We had to turn of Babel in the network and go back to OLSRv1. Which is a reasonable thing to do in order to solve your short-term issues. I hope that you'll remain open to working with me to get babeld to scale to your needs -- I assure you that it can be done, but I need profiling data in order to do that. -- Juliusz ___ Babel-users mailing list Babel-users@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users
Re: [Babel-users] Bucket full, dropping packet
> Attaching some more dumps. Thanks, Kosko. These are all IPv4 retractions, and they look just fine to me. I'd need to see the full packet to be sure, but it probably means that there's some ambiguity in the code about handling retractions with no suitable next hop. I'll have a look when I have time. In the meantime, any chance you could dump a full packet with an unparseable TLV? > It seems that many babeld instances have started using a lot of CPU and > routes are unstable. I'd be grateful for some profiling output. I've deliberately used very simple data structures in babeld, and there's a lot of optimisation that could be done. I suspect that you're running into limitations of the kernel interface, which is pretty horrible right now. -- Juliusz ___ Babel-users mailing list Babel-users@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users
Re: [Babel-users] Bucket full, dropping packet
What about python scapy to generate bogus packets ? Saverio Il 08/dic/2015 18:01, "Dave Taht"ha scritto: > On Tue, Dec 8, 2015 at 5:58 PM, Jernej Kos wrote: > > Hello! > > > > On 07. 12. 2015 17:14, Juliusz Chroboczek wrote: > >> Yes, that's expected. Please increase the limits, be bold, multiply > them > >> by 20. > > > > It seems that raising the limits solved the problem. Thanks! > > > > We are still on the lookout for unparsable packets ;-) > > I would like to see someone working on a babel fuzzer, or does someone > know of a tool that could generate tons of packets bogus in every way > possible? > > > > Jernej > > > > > > ___ > > Babel-users mailing list > > Babel-users@lists.alioth.debian.org > > http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users > > ___ > Babel-users mailing list > Babel-users@lists.alioth.debian.org > http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users > ___ Babel-users mailing list Babel-users@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users
Re: [Babel-users] Bucket full, dropping packet
Hello! On 07. 12. 2015 17:14, Juliusz Chroboczek wrote: > Yes, that's expected. Please increase the limits, be bold, multiply them > by 20. It seems that raising the limits solved the problem. Thanks! We are still on the lookout for unparsable packets ;-) Jernej signature.asc Description: OpenPGP digital signature ___ Babel-users mailing list Babel-users@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users
Re: [Babel-users] Bucket full, dropping packet
On Tue, Dec 8, 2015 at 5:58 PM, Jernej Koswrote: > Hello! > > On 07. 12. 2015 17:14, Juliusz Chroboczek wrote: >> Yes, that's expected. Please increase the limits, be bold, multiply them >> by 20. > > It seems that raising the limits solved the problem. Thanks! > > We are still on the lookout for unparsable packets ;-) I would like to see someone working on a babel fuzzer, or does someone know of a tool that could generate tons of packets bogus in every way possible? > > Jernej > > > ___ > Babel-users mailing list > Babel-users@lists.alioth.debian.org > http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users ___ Babel-users mailing list Babel-users@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users
Re: [Babel-users] Bucket full, dropping packet
Hello! On 07. 12. 2015 10:16, Juliusz Chroboczek wrote: > How large is your network? How many routes through how many neighbours? It is around 550 routes and there is one node, which currently has 75 neighbours. Is this be enough to trigger the limits? Jernej signature.asc Description: OpenPGP digital signature ___ Babel-users mailing list Babel-users@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users
Re: [Babel-users] Bucket full, dropping packet
> It is around 550 routes and there is one node, which currently has 75 > neighbours. Excellent. > Is this be enough to trigger the limits? Yes, that's expected. Please increase the limits, be bold, multiply them by 20. -- Juliusz ___ Babel-users mailing list Babel-users@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users