On Sat, May 12, 2018 at 02:57:23AM +0800, Marek Lindner wrote:
> Whenever a new VLAN is created on top of batman virtual interfaces
> the batman-adv kernel module creates internal structures to track
> the status of said VLAN. Amongst other things, the MAC address of
> the VLAN interface itself has to be stored.
>
> Without this change a VLAN and its infrastructure could be created
> while the interface MAC address is not stored without triggering
> any error, thus creating issues in other parts of the code.
>
> Prevent the VLAN from being created if the MAC address can not
> be stored.
>
> Fixes: 952cebb57518 ("batman-adv: add per VLAN interface attribute framework")
>
> Signed-off-by: Marek Lindner <[email protected]>
I tested this patch but so far could not spot any issues either in
dmesg or logread.
I've added these patches to a branch for Gluon here:
https://github.com/T-X/gluon/tree/tt-vlan-patched
And used these images (warning, they have my SSH public added):
https://metameute.de/~tux/Freifunk/firmware/ffh-tt-patched/
I've tested with an isolated two nodes setup for now.
I started playing with restarting the network multiple times:
~~~~~
root@freifunk-b0487ae7f31e:~# rm /tmp/vlan-test.log; trap '' SIGPIPE; for i in
`seq 1 30`; do echo "Starting network restart $i" >> /tmp/vlan-test.log;
/etc/init.d/network restart; sleep 5; if batctl tl | grep " 0 \["; then echo
"BROKEN - aborting" >> /tmp/vlan-test.log; batctl tl >> /tmp/vlan-test.log;
sleep 3; echo "waiting..." >> /tmp/vlan-test.log; batctl tl >>
/tmp/vlan-test.log; break; fi; done; echo "finished" >> /tmp/vlan-test.log
~~~~~
And the result is the following - which looks odd?
~~~~~
root@freifunk-b0487ae7f31e:~# cat /tmp/vlan-test.log
Starting network restart 1
Starting network restart 2
Starting network restart 3
BROKEN - aborting
[B.A.T.M.A.N. adv 2018.1, MainIF/MAC: primary0/66:c6:34:9d:58:43
(bat0/b0:48:7a:e7:f3:1e BATMAN_IV), TTVN: 1]
Client VID Flags Last seen (CRC )
9a:86:17:9c:5f:4f -1 [.P.X..] 0.000 (0x0ce60e81)
b0:48:7a:e7:f3:1e 0 [.PN...] 0.000 (0x00000000)
b0:48:7a:e7:f3:1e -1 [.PN...] 0.000 (0x0ce60e81)
waiting...
[B.A.T.M.A.N. adv 2018.1, MainIF/MAC: primary0/66:c6:34:9d:58:43
(bat0/b0:48:7a:e7:f3:1e BATMAN_IV), TTVN: 2]
Client VID Flags Last seen (CRC )
b0:48:7a:e7:f3:1e 0 [.P....] 0.000 (0xc4c7d9cf)
b0:48:7a:e7:f3:1e -1 [.P....] 0.000 (0x62afdc24)
finished
~~~~~
However, this oddity seems to be temporary, now the local TT looks
just fine, without having rebooted the node:
~~~~~
root@freifunk-b0487ae7f31e:~# batctl tl
[B.A.T.M.A.N. adv 2018.1, MainIF/MAC: primary0/66:c6:34:9d:58:43
(bat0/b0:48:7a:e7:f3:1e BATMAN_IV), TTVN: 4]
Client VID Flags Last seen (CRC )
33:33:ff:40:f8:dc -1 [.P....] 0.000 (0xd118c666)
b0:48:7a:e7:f3:1e 0 [.P....] 0.000 (0xc4c7d9cf)
33:33:00:00:00:02 -1 [.P....] 0.000 (0xd118c666)
33:33:ff:00:00:01 -1 [.P....] 0.000 (0xd118c666)
33:33:00:02:10:01 -1 [.P....] 0.000 (0xd118c666)
01:00:5e:00:00:01 -1 [.P....] 0.000 (0xd118c666)
b0:48:7a:e7:f3:1e -1 [.P....] 0.000 (0xd118c666)
33:33:ff:e7:f3:1e -1 [.P....] 0.000 (0xd118c666)
33:33:00:00:00:01 -1 [.P....] 0.000 (0xd118c666)
~~~~~
Or is it expected that a TT VLAN entry with an "N" flag will have
the CRC set to 0x00000000?
I also noticed that the VLAN 0 is added to bat0 by 8021q right
after bat0 gets created and activated:
~~~~~
Sun Feb 25 14:20:28 2018 kern.info kernel: [ 7852.985327] batman_adv: bat0:
Adding interface: primary0
Sun Feb 25 14:20:28 2018 kern.info kernel: [ 7852.990712] batman_adv: bat0:
Interface activated: primary0
Sun Feb 25 14:20:28 2018 kern.info kernel: [ 7853.025080] 8021q: adding VLAN 0
to HW filter on device bat0
Sun Feb 25 14:20:28 2018 daemon.notice netifd: Interface 'bat0' is enabled
Sun Feb 25 14:20:28 2018 kern.info kernel: [ 7853.038815] device bat0 entered
promiscuous mode
Sun Feb 25 14:20:28 2018 kern.info kernel: [ 7853.043649] br-client: port
3(bat0) entered forwarding state
Sun Feb 25 14:20:28 2018 kern.info kernel: [ 7853.049388] br-client: port
3(bat0) entered forwarding state
Sun Feb 25 14:20:28 2018 daemon.notice netifd: Network device 'bat0' link is up
Sun Feb 25 14:20:28 2018 daemon.notice netifd: Interface 'bat0' has link
connectivity
Sun Feb 25 14:20:28 2018 daemon.notice netifd: Interface 'bat0' is setting up
now
Sun Feb 25 14:20:28 2018 daemon.notice netifd: Interface 'bat0' is now up
~~~~~
Which looks like it might have the potential for a race condition?
Also the "HW filter" remark by 8021q seems a bit odd as this is a
virtual interface, doesn't it?
Regards, Linus