I discovered an odd issue once I upgraded my OpenBSD pf
firewall/router that manifested itself by preventing my email server
from sending to verizon.net customers. The strange thing was that mail
was going out to other domains. I figured out that I did something odd
in my ruleset and fixed it, so now I am wondering what is going on. I
am only aware of one other individual with these symptoms, but he was
using a bridge with pf and our fixes are at least semantically
different.
I have reduced everything to basic working parts and tested a few
times to narrow down what is happening. In summary, I found that I can
create two pass-only rules to nat outgoing traffic using carp and
rdomains, but the traffic to verizon.net doesn't work unless I use a
combination of two pass rules and a match rule. The basic setup where
you can see this behavior follows (public IPs changed to protect the
innocent):
# ifconfig em0
em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
lladdr 00:90:0b:1f:72:e4
priority: 0
groups: egress
media: Ethernet autoselect (1000baseT
full-duplex,master,rxpause,txpause)
status: active
inet 10.0.0.1 netmask 0xfffffffc broadcast 10.0.0.3
# ifconfig em1
em1:
flags=28b43<UP,BROADCAST,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST,NOINET6>
rdomain 1 mtu 1500
lladdr 00:90:0b:1f:72:e5
priority: 0
media: Ethernet autoselect (100baseTX full-duplex,rxpause,txpause)
status: active
inet 9.9.9.170 netmask 0xfffffff0 broadcast 9.9.9.175
# ifconfig carp1
carp1: flags=28843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,NOINET6>
rdomain 1 mtu 1500
lladdr 00:00:5e:00:01:09
priority: 0
carp: MASTER carpdev em1 vhid 9 advbase 1 advskew 0
groups: carp
status: master
inet 9.9.9.167 netmask 0xfffffff0 broadcast 9.9.9.175
inet 9.9.9.168 netmask 0xffffffff broadcast 9.9.9.168
# route -T 0 -n show -inet
Routing tables
Internet:
Destination Gateway Flags Refs Use Mtu Prio Iface
default 10.0.0.1 UGS 0 9 - 8 em0
10.0.0.0/30 link#1 UC 2 0 - 4 em0
10.0.0.1 00:90:0b:1f:72:e4 HLc 1 0 - 4 lo0
10.0.0.2 00:14:22:2e:ba:8c UHLc 0 10 - 4 em0
9.9.9.168 127.0.0.1 UGHS 0 0 33200 8 lo0
127/8 127.0.0.1 UGRS 0 0 33200 8 lo0
127.0.0.1 127.0.0.1 UH 2 0 33200 4 lo0
224/4 127.0.0.1 URS 0 0 33200 8 lo0
# route -T 1 -n show -inet
Routing tables
Internet:
Destination Gateway Flags Refs Use Mtu Prio Iface
default 9.9.9.161 UGS 0 14 - 8 em1
9.9.9.160/28 link#2 UC 1 0 - 4 em1
9.9.9.161 00:1b:54:b7:81:a8 UHLc 1 0 - 4 em1
9.9.9.168/32 9.9.9.168 U 0 10 - 4 carp1
# cat /etc/hostname.em0
inet 10.0.0.1 255.255.255.252 NONE
# cat /etc/hostname.em1
inet 9.9.9.170 255.255.255.240 9.9.9.175 rdomain 1
!route -T 1 add default 9.9.9.161
# cat /etc/hostname.carp1
inet 9.9.9.167 255.255.255.240 9.9.9.175 vhid 9\
pass password rdomain 1
inet alias 9.9.9.168 255.255.255.255
# cat /etc/mygate
10.0.0.1
# cat /etc/pf.conf
set skip on lo
block
# LAN to Internet with three rules and rdomain
# (fixes the verizon issue)
#match out on em1 inet from 10.0.0.2\
to any nat-to 9.9.9.170
#pass out on em1 inet from 9.9.9.170\
to any
#pass in on em0 from 10.0.0.2\
to any rtable 1
# example LAN to Internet with two rules and rdomain
# (doesn't work)
# Seeing TTL expired in transit
#pass in on em0 inet from 10.0.0.2\
to any nat-to 9.9.9.170 rtable 1
#pass out on em1 inet from 9.9.9.170 to any
# Internet access over rdomain and carp
# (creates the verizon issue)
pass in quick on em0 inet from 10.0.0.2\
to any nat-to 9.9.9.168 rtable 1
pass out quick on em1 inet from 9.9.9.168\
to any
-----------------------------------------------------------------------------------------------
>From 10.0.0.2 I run the following commands:
(first a non-verizon smtp server)
telnet 207.155.253.210 25
(works, but a little slower to display the banner under the pass-only rules)
(now one of the relay.verizon.net smtp servers)
telnet 206.46.232.11 25
(fails to connect unless I use the match/pass rule combo)
In the rules above I also found that the two-rule setup doesn't work
in any case with the public if physical IP in the rdomain. I have
looked at these over tcpdump and I can see the traffic going out with
the proper NAT to either server, but the returning SYN/ACKs in the
handshake from verizon arrive and do not forward to the internal host.
One thing I have noticed is that the verizon ttl is higher than the
other server, but I don't know how this makes a difference. Given the
fact that there is a ttl expired in transit when I try to nat-to the
public physical if IP I guess there is some routing ping pong going on
here, or yet another oversight on my part. I am not using verizon for
my ISP and my IPs are not on their or any other blocklist, but that is
implied by it working when I set my rules right. Anyone have an idea
why this configuration is wrong some of the time? I would like to
better understand this for future reference.
Oh, and here is the other guy's documentation:
http://wiki.davidpierron.com/doku.php?id=pf:conf
And his solution from an email he sent to me:
>I added the bridge0 interface as defined in rc.conf to the pf.conf and added
>one rule and Verizon connections work now ...
>
>2 lines in pf.conf:
>brg_if="bridge0"
>pass out on $brg_if proto tcp from <mailserver> to any keep state