Re: [Babel-users] Bucket full, dropping packet

2015-12-14 Thread Matthieu Boutier
>> Matthieu, do you understand why that is?  Is there a way to optimise away
>> conflict_solution in the easy case?
> 
> I think so.  Will fix it.

The attached patch should solve the problem.  As a conflict need a specific 
route, the first now loop iterates on specific routes only.  If there is no 
such route, the function exits immediately.  Otherwise, we need to traverse the 
RIB for each specific route.  Further improvements may be possible (avoid the 
malloc/free call), but I think this should be sufficient.

It has currently been tested on our testbed network, and doesn't seem to crash 
the network.  Though, some review may be valuable before real deployment.

Matthieu



0001-Optimization-by-allowing-traversing-installed-specif.patch
Description: Binary data
___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] Bucket full, dropping packet

2015-12-14 Thread Juliusz Chroboczek
> The attached patch should solve the problem.

Nice, applied.

> +if(!check_specific_first())
> +fprintf(stderr, "Invariant failed: specific routes first in RIB.\n");

#ifdef DEBUG, or if(debug_level >= 2) ?

-- Juliusz


___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] Bucket full, dropping packet

2015-12-14 Thread Henning Rogge
On Mon, Dec 14, 2015 at 8:15 PM, Dave Taht  wrote:
> Is there a reliable way of determining that an underlying interface is a 
> bridge?

A local bridge/wifi? Sure...

just look into the source of the brctl tool... it can give you a list
of all bridge interfaces.

A "wifi bridge" in an external device? That would be a job for DLEP... ;)

Henning

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] Bucket full, dropping packet

2015-12-14 Thread Matthieu Boutier
> #ifdef DEBUG, or if(debug_level >= 2) ?

Well, I was not sure about this one.  The problem with debug_level is that it 
produces too verbose output, it's not just "checks".  I was rather thinking 
about having a test-mode version of babeld, which let the clean babeld-code 
as-is, and add some code around, like mocks. (?)

Matthieu


___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] Bucket full, dropping packet

2015-12-13 Thread Mitar
Hi!

But it will be more complicated with upgrades, and more complicated
assuring that things are really the same (same patched version of
Babel, same kernel version, TCP/IP stack, sysctl settings, etc.).


Mitar

On Sun, Dec 13, 2015 at 1:00 PM, Juliusz Chroboczek
 wrote:
>>> Ok, I'll see on Monday if I can get an extra VM before Christmas.
>>
>> Which VM system are you using? We might be able to generate you a
>> ready-made image.
>
> Please don't -- I'll let our system administrators clone their usual
> VMWare image, it's better for everyone if I use what they're familiar
> with.
>
> -- Juliusz



-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] Bucket full, dropping packet

2015-12-13 Thread Matthieu Boutier
> Matthieu, do you understand why that is?  Is there a way to optimise away
> conflict_solution in the easy case?

I think so.  Will fix it.  I may call you before.

Matthieu


___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] Bucket full, dropping packet

2015-12-13 Thread Dave Taht
On Sun, Dec 13, 2015 at 2:52 PM, Juliusz Chroboczek
 wrote:
>> Ok, I can do some profiling on the babeld that is running on the VPN
>> server with the large number of links. Just tell me what profiling data
>> do you want? Should I just compile a debug build and run babeld through
>> callgrind or do you have something else in mind?
>
> I'm not familiar with callgrind -- I've had both results with both "perf
> record" and gprof.  But yes, callgrind should be fine.
>
> I need to find out where the CPU time is going.  I suspect either the
> quadratic loop in xroute.c, or linear-time route selection in route.c.
> I intend to fix both, but I'd like to be sure.
>
>> Yes, you only need to establish a VPN connection to our server using
>> tunneldigger-client [1] (it compiles on Debian) and run babeld on the
>> VPN interface. We only need to allocate an IPv4 address for you so there
>> will be no conflicts.
>
> Ok, I'll see on Monday if I can get an extra VM before Christmas.

I have a half dozen machines all over the world, courtesy of linode.
Can spin a new one up for you in a matter of minutes.

>
> -- Juliusz
>
> ___
> Babel-users mailing list
> Babel-users@lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] Bucket full, dropping packet

2015-12-13 Thread valent.turko...@gmail.com
On 13 December 2015 at 22:00, Juliusz Chroboczek <
j...@pps.univ-paris-diderot.fr> wrote:

> >> Ok, I'll see on Monday if I can get an extra VM before Christmas.
> >
> > Which VM system are you using? We might be able to generate you a
> > ready-made image.
>
> Please don't -- I'll let our system administrators clone their usual
> VMWare image, it's better for everyone if I use what they're familiar
> with.
>


Hi Juliusz and all,
I have VPS running babel and tunneldigger which is part of Wlan Slovenia
network. If it helps I can give you, and anybody else who can troubleshoot,
instant root access to this server, just provide me with your public ssh
keys.

Thanks,
Valent.
___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] Bucket full, dropping packet

2015-12-13 Thread Juliusz Chroboczek
>> Ok, I'll see on Monday if I can get an extra VM before Christmas.
>
> Which VM system are you using? We might be able to generate you a
> ready-made image.

Please don't -- I'll let our system administrators clone their usual
VMWare image, it's better for everyone if I use what they're familiar
with.

-- Juliusz

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] Bucket full, dropping packet

2015-12-12 Thread Juliusz Chroboczek
>> Oh, sadly not 75 max. The number of neighbours of one node is
>> potentially very large, because all nodes with Internet uplink
>> connects to the VPN server. So, all those nodes are then neighbours of
>> that VPN server node. Currently this is for example 140 nodes on one

> So yes, 75 is the current number of Babel peers, but there are otherwise
> many more peers and there will be even more. How does babeld scale in
> this case (BTW, is this a problem with the protocol or the implementation)?

The implementation.  The protocol should be able to scale very well.

Look at the code at xroute.c line 304 -- this is quadratic in the number
of the routes in the kernel, and is done every time the kernel tells us
the set of routes has changed.  This is easily fixed (the kernel provides
us with deltas, we don't need to take a complete dump every time), but
I need some profiling data before I spend time on that.

Another issue is that a Babel node keeps a redundant routing table -- it
keeps routes that are not installed so it can recover from a failure
immediately.  The current implementation doesn't limit the amount of
redundancy -- it will keep up to (v*n) routes.  The obvious solution is to
be just 3-redundant -- keep just the best 3 routes to a given destination.
Obviously, this means that route selection will be suboptimal in some
cases, and that recovery from loss will be slower, but something's got to
give.

Mitar, Jernej -- this is exciting stuff, and I'm pretty sure we'll be able
to solve it.

> But I am thinking that we do need some faster way to debug these
> issues that this ping-pong over the mailing list, with what should we
> run and then file back and so on. What about you getting one cheap
> TPLink, go to https://nodes.wlan-si.net/, register a node and deploy
> it?

Very good idea.  Is it possible to use a Debian box?  With all the
profiling and debugging tools, it would be more useful.

-- Juliusz

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] Bucket full, dropping packet

2015-12-11 Thread Jernej Kos
Hello!

On 08. 12. 2015 17:58, Jernej Kos wrote:
> We are still on the lookout for unparsable packets ;-)

Got one!

Couldn't parse packet (8, 14) from fe80::2ff:1bff:fe10:3d34 on digger1438.
Packet dump: 08 0e 01 00 20 00 06 40 32 8e ff ff 0a fe 00 08


Jernej



signature.asc
Description: OpenPGP digital signature
___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] Bucket full, dropping packet

2015-12-11 Thread Mitar
Hi!

Hm, I thought that Babel was tested on large networks and that it was
tested on simulated large networks? Or are we now the largest network
using it and this is why we are getting in all this trouble? So this
is just another academic project which looks good on the paper but in
practice it is not really production grade?

We had to turn of Babel in the network and go back to OLSRv1. So much
for smooth transition.


Mitar

On Fri, Dec 11, 2015 at 10:29 AM, Jernej Kos  wrote:
> Hello!
>
> On 11. 12. 2015 18:47, Matthieu Boutier wrote:
>> Did you know where does this version comes from?  Is there a packet's
>> version, or whatever?
>
> I will add some code that dumps the whole packet.
>
>> It's really strange that an "Update" message could fail.  After a
>> quick look at the code, I think it should fail at line 513.  Could
>> you see if changing
>
> I will try.
>
>
> Jernej
>
>
> ___
> Babel-users mailing list
> Babel-users@lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users



-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] Bucket full, dropping packet

2015-12-11 Thread Juliusz Chroboczek
> Hm, I thought that Babel was tested on large networks and that it was
> tested on simulated large networks?

Babel has been tested in one large network (1500 nodes).  It took 20
minutes to converge, and worked fine after convergence time.  I fixed the
initial convergence issues (too many broadcasts at boot), but didn't
receive any more feedback from the company who had done the test (I didn't
want to sign any non-disclosure agreements).

The amount of state that a Babel node maintains is proportional to v*r,
where v is the number of neighbours and r the number of routes.  Your
network is somewhat unusual in that it has some very central nodes -- 75
neighbours max, I believe --, which is something that Babel doesn't like
very much.  The protocol should be able to deal with that (75 * 500 is
less than 4), but the implementation will likely need some tuning.
I'm hoping that you can help me do the tuning.

> Or are we now the largest network using it and this is why we are
> getting in all this trouble?

You are the largest Babel network right now.  I'm very excited about your
deployment, and I'm looking forward to tuning the babeld implementation to
work well enough for your needs.

> So this is just another academic project which looks good on the paper
> but in practice it is not really production grade?

Most academic projects produce no useful software, just simulation.  We
are doing our best to provide production-quality software, and as a matter
of fact babeld is running right now in a production network of 200 nodes.
However, Nexedi's network has been designed with Babel in mind, and it
doesn't have any central nodes -- all nodes have roughly the same number
of neighbours.

> We had to turn of Babel in the network and go back to OLSRv1.

Which is a reasonable thing to do in order to solve your short-term
issues.  I hope that you'll remain open to working with me to get babeld
to scale to your needs -- I assure you that it can be done, but I need
profiling data in order to do that.

-- Juliusz

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] Bucket full, dropping packet

2015-12-11 Thread Juliusz Chroboczek
> Attaching some more dumps.

Thanks, Kosko.

These are all IPv4 retractions, and they look just fine to me.  I'd need
to see the full packet to be sure, but it probably means that there's some
ambiguity in the code about handling retractions with no suitable next
hop.

I'll have a look when I have time.  In the meantime, any chance you could
dump a full packet with an unparseable TLV?

> It seems that many babeld instances have started using a lot of CPU and
> routes are unstable.

I'd be grateful for some profiling output.  I've deliberately used very
simple data structures in babeld, and there's a lot of optimisation that
could be done.  I suspect that you're running into limitations of the
kernel interface, which is pretty horrible right now.

-- Juliusz

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] Bucket full, dropping packet

2015-12-09 Thread Saverio Proto
What about python scapy to generate bogus packets ?
Saverio
Il 08/dic/2015 18:01, "Dave Taht"  ha scritto:

> On Tue, Dec 8, 2015 at 5:58 PM, Jernej Kos  wrote:
> > Hello!
> >
> > On 07. 12. 2015 17:14, Juliusz Chroboczek wrote:
> >> Yes, that's expected.  Please increase the limits, be bold, multiply
> them
> >> by 20.
> >
> > It seems that raising the limits solved the problem. Thanks!
> >
> > We are still on the lookout for unparsable packets ;-)
>
> I would like to see someone working on a babel fuzzer, or does someone
> know of a tool that could generate tons of packets bogus in every way
> possible?
> >
> > Jernej
> >
> >
> > ___
> > Babel-users mailing list
> > Babel-users@lists.alioth.debian.org
> > http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users
>
> ___
> Babel-users mailing list
> Babel-users@lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users
>
___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] Bucket full, dropping packet

2015-12-08 Thread Jernej Kos
Hello!

On 07. 12. 2015 17:14, Juliusz Chroboczek wrote:
> Yes, that's expected.  Please increase the limits, be bold, multiply them
> by 20.

It seems that raising the limits solved the problem. Thanks!

We are still on the lookout for unparsable packets ;-)


Jernej



signature.asc
Description: OpenPGP digital signature
___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] Bucket full, dropping packet

2015-12-08 Thread Dave Taht
On Tue, Dec 8, 2015 at 5:58 PM, Jernej Kos  wrote:
> Hello!
>
> On 07. 12. 2015 17:14, Juliusz Chroboczek wrote:
>> Yes, that's expected.  Please increase the limits, be bold, multiply them
>> by 20.
>
> It seems that raising the limits solved the problem. Thanks!
>
> We are still on the lookout for unparsable packets ;-)

I would like to see someone working on a babel fuzzer, or does someone
know of a tool that could generate tons of packets bogus in every way
possible?
>
> Jernej
>
>
> ___
> Babel-users mailing list
> Babel-users@lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users


Re: [Babel-users] Bucket full, dropping packet

2015-12-07 Thread Jernej Kos
Hello!

On 07. 12. 2015 10:16, Juliusz Chroboczek wrote:
> How large is your network?  How many routes through how many neighbours?

It is around 550 routes and there is one node, which currently has 75
neighbours. Is this be enough to trigger the limits?


Jernej



signature.asc
Description: OpenPGP digital signature
___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users

Re: [Babel-users] Bucket full, dropping packet

2015-12-07 Thread Juliusz Chroboczek
> It is around 550 routes and there is one node, which currently has 75
> neighbours.

Excellent.

> Is this be enough to trigger the limits?

Yes, that's expected.  Please increase the limits, be bold, multiply them
by 20.

-- Juliusz

___
Babel-users mailing list
Babel-users@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users