Hi list,
okay, I'm doing my tricky VLAN setups now again ;) This time I got a
OpenBSD 4.4 box, running inside a VMWare ESXi machine. It got two
interfaces, em0 and em1. em0 is external network, and em1 is host
interface for a bunch of VLAN interfaces on the internal side. on top
of the VLAN interfaces (and the external one) I'm running carp, and
this box got a corresponding backup machine on a small LEX machine
(this is my main VPN/admin gateway, with the LEX being backup if I
need to bring the VMWare host down). Yes, i've had to enable allow
promisc etc in vmware for my "All" network interface in order to get
carp working.
Anyway, this have been working all fine, and yesterday I added another
vlan interface, without a carp ontop. This is where the strange stuff
starts.. In a nutshell, the above box (box C) responds for packets
which are not related to it in any way..
Lets take a look at the setup:
Box A, 192.168.131.1, another VMWare guest running FreeBSD
vlan63: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0
mtu 1500
options=3<RXCSUM,TXCSUM>
ether 00:0c:29:51:8d:31
inet 192.168.131.1 netmask 0xffffff00 broadcast 192.168.131.255
media: Ethernet autoselect (1000baseTX <full-duplex>)
status: active
vlan: 63 parent interface: em1
Box B, 192.168.131.11, a standalone OpenBSD box
vlan63: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
lladdr 00:21:5a:ff:5e:b0
description: backup-net
vlan: 63 priority: 0 parent interface: bge0
groups: vlan
inet6 fe80::221:5aff:feff:5eb0%vlan63 prefixlen 64 scopeid 0x10
inet 192.168.131.11 netmask 0xffffff00 broadcast
192.168.131.255
Box C, 192.168.131.8, the VMWare OpenBSD guest described above.
vlan63: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
lladdr 00:0c:29:18:fc:08
description: backup-net
vlan: 63 priority: 0 parent interface: em1
groups: vlan
inet6 fe80::20c:29ff:fe18:fc08%vlan63 prefixlen 64 scopeid 0xa
inet 192.168.131.8 netmask 0xffffff00 broadcast 192.168.131.255
The other interface on Box C have other 192.168.XX nets, but non with
a netmask wider than /24.. And the 192.168.131.0 net is only on
vlan63. And no odd route either.. have been rebooted twice..
There are not interface on Box C which have similar MAC addresses
either..
Okay, lets look at the problem:
When i SSH from Box A to Box B, that is from 131.1 to 131.11, I expect
to get a connection. Instead my connectino is reset immediatly.. Lets
follow the packet dumps.
The tcpdumps from below are from em1 on box C (tcpdump -vne -i em1
vlan 63, I'll describe later why they are not from -i vlan63)
On first connect, I get this:
13:42:43.951719 00:0c:29:51:8d:31 00:21:5a:ff:5e:b0 8100 78: 802.1Q
vid 63 pri 0 192.168.131.1.56401 > 192.168.131.11.22: S [tcp sum ok]
876330421:876330421(0) win 65535 <mss 1460,nop,wscale
3,sackOK,timestamp 93169894 0> (DF) [tos 0x10] (ttl 64, id 37948, len
60)
Looks ok, 8d:31 (box A) sends to 5e:b0 (Box B). All fine!
Next packet I see is this:
13:42:43.951799 00:0c:29:18:fc:08 00:0c:29:51:8d:31 8100 58: 802.1Q
vid 63 pri 0 192.168.131.11.22 > 192.168.131.1.56401: R [tcp sum ok]
0:0(0) ack 876330422 win 0 (DF) [tos 0x10] (ttl 254, id 32048, len 40)
This is fc:08 (box C), responding with Reset to 8d:31 (box A) ??? Wtf..
13:42:43.951817 00:21:5a:ff:5e:b0 00:0c:29:51:8d:31 8100 82: 802.1Q
vid 63 pri 0 192.168.131.11.22 > 192.168.131.1.56401: S [tcp sum ok]
2864936215:2864936215(0) ack 876330422 win 16384 <mss
1460,nop,nop,sackOK,nop,wscale 0,nop,nop,timestamp 3196573386
93169894> (DF) (ttl 254, id 62284, len 64)
And here 5e:b0 (Box B) responding with ACK to 8d:31 (Box A), as
expected..
And the following is Box C telling Box C that it resets the conncetion
too (as a response to the above?) or something.. and then the
conncetion is killed and dies off
13:42:43.951848 00:0c:29:18:fc:08 00:21:5a:ff:5e:b0 8100 58: 802.1Q
vid 63 pri 0 192.168.131.1.56401 > 192.168.131.11.22: R [tcp sum ok]
1:1(0) ack 1 win 0 (DF) [tos 0x10] (ttl 254, id 31057, len 40)
13:42:43.952036 00:0c:29:51:8d:31 00:21:5a:ff:5e:b0 8100 70: 802.1Q
vid 63 pri 0 192.168.131.1.56401 > 192.168.131.11.22: . [tcp sum ok]
ack 1 win 8326 <nop,nop,timestamp 93169894 3196573386> (DF) [tos 0x10]
(ttl 64, id 37950, len 52)
13:42:43.952082 00:0c:29:18:fc:08 00:0c:29:51:8d:31 8100 58: 802.1Q
vid 63 pri 0 192.168.131.11.22 > 192.168.131.1.56401: R [tcp sum ok]
1:1(0) ack 1 win 0 (DF) [tos 0x10] (ttl 254, id 24397, len 40)
13:42:43.952119 00:21:5a:ff:5e:b0 00:0c:29:51:8d:31 8100 60: 802.1Q
vid 63 pri 0 192.168.131.11.22 > 192.168.131.1.56401: R [tcp sum ok]
2864936216:2864936216(0) win 0 (DF) (ttl 254, id 41778, len 40)
13:42:43.952158 00:0c:29:18:fc:08 00:21:5a:ff:5e:b0 8100 74: 802.1Q
vid 63 pri 0 192.168.131.8 > 192.168.131.11: icmp: 192.168.131.1 tcp
port 56401 unreachable (ttl 255, id 62992, len 56)
Okay.. So.. what the heck is going on here? :) Now, the reason I didnt
do tcpdump directly on vlan63 in this example, is that if I bring the
vlan63 inteface into promisc mode, all works out fine! It doesnt
intercept my packets or anything..
If i take a look at the pflog0 interface on Box C, i can see that my
block return rule matches on the packets vlan63, if that got anything
to do with it.
I do have ip forwarding enabled on Box C, as this is a router. One
idea I had to this, was that it thought the packet was to be routed
but PF blocks it and resets the conneciton. But why would it think
that, when the mac address is not used on this interface/box?
So, does anyone have any hints?
For the moment I've worked around the problem by adding another
interface to the vmware guest, and running this traffic on this
interface without a vlan device inbetween.. And it works flawless. But
I'd thought I report this anyway :)
Thanks!
Johan