On Wed, Jul 04, 2012 at 09:43:13AM -0300, Guido Iribarren wrote:
> On Wed, Jul 4, 2012 at 6:12 AM, Simon Wunderlich
> <simon.wunderl...@s2003.tu-chemnitz.de> wrote:
> > Hello Guido,
> >
> > On Tue, Jul 03, 2012 at 05:07:17PM -0300, Guido Iribarren wrote:
> >> Hello there again,
> >> I have observed a problem since updating to 2012.2 and enabled BLAII
> >>
> >> I'm compiling logs to understand what's happening, but as always,
> >> reading logs only gets me more lost :(
> >> So here i am again begging for help
> >
> > There are some debug levels for BLA as well, and you can now get the
> > claimlist with batctl (which is basically the list of clients a gateway
> > feels responsible for) - this may help for debugging. But first,
> > we should clarify some more details for your setup.
> Yes, I've seen the cl command, but didn't completely understand how to
> interpret it. For example, right now I see the clients claimed in the
> cl of mesh nodes, and even the same client claimed in different nodes.
> 

> ( when I say mesh nodes, and in the rest of the email, i'm referring
> to 
> http://www.open-mesh.org/wiki/batman-adv/Bridge-loop-avoidance-II#Definitions
> )
> 
> i.e. sample mesh-nodes:
> root@charly:~# batctl cl
> Claims announced for the mesh bat0 (orig            charly, group id 6412)
>    Client               VID      Originator        [o] (CRC )
>  * 00:25:d3:f5:93:76 on    -1 by            charly [x] (77f9)
>  * f8d1113b6e66_eth0 on    -1 by            charly [x] (77f9)
> 
> root@hquilla:~# batctl cl
> Claims announced for the mesh bat0 (orig           hquilla, group id 82cb)
>    Client               VID      Originator        [o] (CRC )
>  * 00:25:d3:f5:93:76 on    -1 by           hquilla [x] (c72e)
>  * 00:24:81:4b:ea:6d on    -1 by           hquilla [x] (c72e)
> 
> maybe that's fine because they have different group ids? (??)

Yup. They are not interconnected via Ethernet, so they are in different
backbones and have a different group id.

If there are other gatways on the backbone with claims, you should see them
in the claim list as well.

> is there any documentation on the cl output?

Yup, you can find it here:

http://www.open-mesh.org/wiki/batman-adv/Understand-your-batman-adv-network

> as far i could interpret, CRC "identifies" a particular version of a table,

yup.

> [o] = [x] means "this is claimed by myself"

yup.

> group id identifies different backbones (like in this case:)
> http://www.open-mesh.org/wiki/batman-adv/Bridge-loop-avoidance-Testcases#Two-LANs-connected-by-one-mesh
> 

yup.

> and VID, is always set to -1 :P
> oh, maaaaybe it's vlan id (?) since i'm not using VLANs

it means you have no VLAN here.

> 
> >>
> >> the setup is the same I described in yesterday's attachment, but
> >> what's not pictured is an ethernet cable between colmena-casa and
> >> f8d11504758.
> >> f8d11504758 is the only router that connects to the internet (through
> >> WAN cable), and it's also the only one that has dnsmasq running and
> >> gw_mode=server.
> >> All the other nodes have gw_mode=client
> >>
> >> All of the nodes have bridge_loop_avoidance=1
> >> (even though there are no other utp connections, so it could in fact
> >> be enabled only on colmena-casa and f8d11504758)
> >>
> >> with this setup, dhcp requests from the mesh sometimes get "lost",
> >> either they don't reach f8d11504758 or the reply doesn't get out
> >
> > Questions:
> >  * which node runs the DHCP server? colmena-casa, f8d11504758 or something 
> > else?
> Only the node f8d11504758 runs a DHCP server (dnsmasq) on its interface br-lan
> no other dhcp server is running on the network
> 

OK.

> >  * at which point is DHCP getting lost? is the DISCOVER/REQUEST from the 
> > client
> >    getting lost, or the reply from the server?
> 
> Well, I just managed to get a clarifying tcpdump!
> hquilla sent a select (REQUEST) that reached the wlan0-2 (mesh)
> interface of f8d11504758 and it was silently dropped (didn't appear on
> a batctl td of bat0)
> this repeated several times, until a lucky REQUEST managed to pass
> through, was sniffed at bat0, and got a reply from dnsmasq
> 
> I couldn't see any difference between the unlucky and lucky REQUESTs
> or DISCOVERs,
> but running a "batctl cl -w1" did the trick:
> when the client is currently claimed by f8d11504758 as in
>  *      hquilla_eth0 on    -1 by      f8d11504758 [x] (d38b)
> 
> both the REQUESTs and DISCOVERs reach dnsmasq fine
> 
> but if the client is currently claimed by colmena-casa as in
>  *      hquilla_eth0 on    -1 by      colmena-casa [ ] (3d7f)
> these discover/requests get dropped by batman when they arrive through wlan0-2
> 

Ah, this helps. If the client is claimed by colmena-case, the request
should go from hquilla to colmena-casa via mesh, and from colmena-casa to
f8d11504758 via LAN. 

I guess the problem is the interaction between BLA and the gateway feature. The 
DHCP
request is sent via unicast to f8d11504758, but the destination address is still
broadcast. The bla implementation in f8d11504758 will then think that 
colmena-casa 
also has received the broadcast (but it didn't), and therefore drop it.

I'll send a patch for that soon ...

> >
> > So DHCP is only having problems when gw-mode is turned on colmena-casa
> > and f8d11504758?
> 
> gw-mode is activated in all mesh nodes, not only in colmena-casa and
> f8d11504758
> it's set to client on every node except f8d11504758, which has gw_mode=server
> 
> As far as i can recall, disabling gw_mode=client in every mesh node,
> solved the problem.
> But now that i found out about this "batctl td" thing, i'm in doubt
> about the validity of the previous statement :(
> i should check again and report.
> 

That would at least match to the hypothesis. :)

[removed the rest of the text. will send the patch soon ...]

Cheers,
        Simon

Attachment: signature.asc
Description: Digital signature

Reply via email to