Just building off my last message. Answering Ryans questions first:
- Do you have dedicated addresses on the carp parent interfaces?
For sure.
- Are all the carp devices on the master firewall MASTER; what about the
backup?
Before and after the network dies, primary firewall is all MASTER,
secondary stays as BACKUP.
- Can you reach the 'dissapearing' network from the backup firewall?
Yes.
- Is preemption enabled? (sysctl net.inet.carp.preempt=1)
Yes.
- What is the output of 'netstat -sp carp' on both the master and backup
firewalls?
Have it below.
- What about the output of 'netstat -i'? Are there output errors on the
offending interface?
Exact output below, but no errors in or out, before or after.
- Have you tried running with carp debugging turned on? (sysctl
net.inet.carp.log=1)
Did this on both firewalls, didn't see output one way or the other.
Restarted with it in sysctls.conf just to be sure, but didn't see
anything.
What further I know:
- set debug loud, lots of output, nothing looks different while the
problem is present.
- From the "dead" network, if I ping the firewall, tcpdump shows the
firewall making an arp request for the originating machine.
18:17:50.015307 arp who-has 172.168.120.50 tell 172.168.120.2
172.168.120.50 is the machine on the dead network, which was trying to
ping the firewall. This would lead me to believe the firewall saw
-something-. Lots of traffic trying to going to, but none come back from
that network.
- I can ping the dead interface locally.
- Bringing interface down and up doesn't help
- From the firewall itself, I can hang that interface. Before I was
doing it from my desktop, through the firewall.
Ifconfig explanation:
gem0 - external
gem1 - 120.x - network that "disappears"
hme0 - 0.x - pfsync traffic
hme1 - 121.x - Network my terminal is on
hme2 - 119.x
My ifconfig -A output from the master firewall:
$ ifconfig -A
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 33192
groups: lo
inet 127.0.0.1 netmask 0xff000000
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0xa
gem0:
flags=8b63<UP,BROADCAST,NOTRAILERS,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST>
mtu 1500
lladdr 00:03:ba:f2:bc:1c
groups: egress
media: Ethernet autoselect (100baseTX full-duplex)
status: active
inet 216.2.22.123 netmask 0xffffffe0 broadcast 216.82.41.127
inet6 fe80::203:baff:fef2:bc1c%gem0 prefixlen 64 scopeid 0x1
gem1:
flags=8b63<UP,BROADCAST,NOTRAILERS,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST>
mtu 1500
lladdr 00:03:ba:f2:bc:1d
media: Ethernet autoselect (100baseTX full-duplex)
status: active
inet 172.168.120.2 netmask 0xffffff00 broadcast 172.168.120.255
inet6 fe80::203:baff:fef2:bc1d%gem1 prefixlen 64 scopeid 0x2
hme0: flags=8863<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST> mtu
1500
lladdr 08:00:20:ee:66:60
media: Ethernet autoselect (100baseTX full-duplex)
status: active
inet 10.0.0.1 netmask 0xffffff00 broadcast 10.0.0.255
inet6 fe80::a00:20ff:feee:6660%hme0 prefixlen 64 scopeid 0x3
hme1:
flags=8b63<UP,BROADCAST,NOTRAILERS,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST>
mtu 1500
lladdr 08:00:20:ee:66:61
media: Ethernet autoselect (100baseTX full-duplex)
status: active
inet 172.168.121.2 netmask 0xffffff00 broadcast 172.168.121.255
inet6 fe80::a00:20ff:feee:6661%hme1 prefixlen 64 scopeid 0x4
hme2:
flags=8b63<UP,BROADCAST,NOTRAILERS,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST>
mtu 1500
lladdr 08:00:20:ee:66:62
media: Ethernet autoselect (100baseTX full-duplex)
status: active
inet 172.168.119.2 netmask 0xffffff00 broadcast 172.168.119.255
inet6 fe80::a00:20ff:feee:6662%hme2 prefixlen 64 scopeid 0x5
hme3: flags=8822<BROADCAST,NOTRAILERS,SIMPLEX,MULTICAST> mtu 1500
lladdr 08:00:20:ee:66:63
media: Ethernet autoselect
pflog0: flags=141<UP,RUNNING,PROMISC> mtu 33192
pfsync0: flags=41<UP,RUNNING> mtu 1348
pfsync: syncdev: hme0 maxupd: 128
enc0: flags=0<> mtu 1536
tun0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 1500
groups: tun
inet 172.168.123.1 --> 172.168.123.2 netmask 0xffffffff
carp0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
carp: MASTER carpdev gem0 vhid 1 advbase 1 advskew 0
groups: carp
inet 216.82.41.116 netmask 0xffffffe0 broadcast 216.82.41.127
inet 216.82.41.97 netmask 0xffffffe0 broadcast 216.82.41.127
inet 216.82.41.98 netmask 0xffffffe0 broadcast 216.82.41.127
inet 216.82.41.117 netmask 0xffffffe0 broadcast 216.82.41.127
inet 216.82.41.118 netmask 0xffffffe0 broadcast 216.82.41.127
inet 216.82.41.119 netmask 0xffffffe0 broadcast 216.82.41.127
inet 216.82.41.120 netmask 0xffffffe0 broadcast 216.82.41.127
inet 216.82.41.125 netmask 0xffffffe0 broadcast 216.82.41.127
inet 216.82.41.126 netmask 0xffffffe0 broadcast 216.82.41.127
carp1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
carp: MASTER carpdev gem1 vhid 2 advbase 1 advskew 0
groups: carp
inet 172.168.120.1 netmask 0xffffff00 broadcast 172.168.120.255
carp2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
carp: MASTER carpdev hme1 vhid 3 advbase 1 advskew 0
groups: carp
inet 172.168.121.1 netmask 0xffffff00 broadcast 172.168.121.255
carp3: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
carp: MASTER carpdev hme2 vhid 4 advbase 1 advskew 0
groups: carp
inet 172.168.119.1 netmask 0xffffff00 broadcast 172.168.119.255
Now for gratuitous output.
On the MASTER firewall, before killing it:
$ sudo pfctl -s info
Status: Enabled for 0 days 00:04:13 Debug: Urgent
Interface Stats for gem0 IPv4 IPv6
Bytes In 10942552 0
Bytes Out 376220 352
Packets In
Passed 9496 0
Blocked 359 0
Packets Out
Passed 6099 3
Blocked 0 2
State Table Total Rate
current entries 318
searches 37680 148.9/s
inserts 640 2.5/s
removals 322 1.3/s
Counters
match 1160 4.6/s
bad-offset 0 0.0/s
fragment 0 0.0/s
short 0 0.0/s
normalize 0 0.0/s
memory 0 0.0/s
bad-timestamp 0 0.0/s
congestion 0 0.0/s
ip-option 0 0.0/s
proto-cksum 0 0.0/s
state-mismatch 0 0.0/s
state-insert 1 0.0/s
state-limit 0 0.0/s
src-limit 0 0.0/s
synproxy 0 0.0/s
$ netstat -sp carp
carp:
8 packets received (IPv4)
0 packets received (IPv6)
0 packets discarded for bad interface
0 packets discarded for wrong TTL
0 packets shorter than header
0 discarded for bad checksums
0 discarded packets with a bad version
0 discarded because packet too short
0 discarded for bad authentication
0 discarded for bad vhid
0 discarded because of a bad address list
1040 packets sent (IPv4)
0 packets sent (IPv6)
0 send failed due to mbuf memory error
$ netstat -i
Name Mtu Network Address Ipkts Ierrs Opkts
Oerrs Colls
...
gem1 1500 <Link> 00:03:ba:f2:bc:1d 752 0 1122
0 0
gem1 1500 172.168.120 carp0 752 0 1122
0 0
gem1 1500 fe80::%gem1 fe80::203:baff:fe 752 0 1122
0 0
...
On the MASTER after killing it:
$ sudo pfctl -s info
Status: Enabled for 0 days 00:07:41 Debug: Urgent
Interface Stats for gem0 IPv4 IPv6
Bytes In 16115704 0
Bytes Out 557189 352
Packets In
Passed 14501 0
Blocked 670 0
Packets Out
Passed 9332 3
Blocked 0 2
State Table Total Rate
current entries 240
searches 63819 138.4/s
inserts 770 1.7/s
removals 530 1.1/s
Counters
match 1887 4.1/s
bad-offset 0 0.0/s
fragment 0 0.0/s
short 0 0.0/s
normalize 0 0.0/s
memory 0 0.0/s
bad-timestamp 0 0.0/s
congestion 0 0.0/s
ip-option 0 0.0/s
proto-cksum 0 0.0/s
state-mismatch 9 0.0/s
state-insert 5 0.0/s
state-limit 0 0.0/s
src-limit 0 0.0/s
synproxy 0 0.0/s
$ netstat -sp carp
carp:
8 packets received (IPv4)
0 packets received (IPv6)
0 packets discarded for bad interface
0 packets discarded for wrong TTL
0 packets shorter than header
0 discarded for bad checksums
0 discarded packets with a bad version
0 discarded because packet too short
0 discarded for bad authentication
0 discarded for bad vhid
0 discarded because of a bad address list
1896 packets sent (IPv4)
0 packets sent (IPv6)
0 send failed due to mbuf memory error
$ netstat -ni
Name Mtu Network Address Ipkts Ierrs Opkts
Oerrs Colls
...
gem1 1500 <Link> 00:03:ba:f2:bc:1d 10313 0 2881
0 0
gem1 1500 172.168.120 172.168.120.2 10313 0 2881
0 0
gem1 1500 fe80::%gem1 fe80::203:baff:fe 10313 0 2881
0 0
...
BACKUP firewall before:
$ netstat -sp carp
carp:
1084 packets received (IPv4)
0 packets received (IPv6)
0 packets discarded for bad interface
0 packets discarded for wrong TTL
0 packets shorter than header
0 discarded for bad checksums
0 discarded packets with a bad version
0 discarded because packet too short
0 discarded for bad authentication
0 discarded for bad vhid
0 discarded because of a bad address list
168 packets sent (IPv4)
0 packets sent (IPv6)
0 send failed due to mbuf memory error
BACKUP after:
$ netstat -sp carp
carp:
2512 packets received (IPv4)
0 packets received (IPv6)
0 packets discarded for bad interface
0 packets discarded for wrong TTL
0 packets shorter than header
0 discarded for bad checksums
0 discarded packets with a bad version
0 discarded because packet too short
0 discarded for bad authentication
0 discarded for bad vhid
0 discarded because of a bad address list
168 packets sent (IPv4)
0 packets sent (IPv6)
0 send failed due to mbuf memory error