On 11/12/2016 17:22, chris g wrote:
Hello,
I've decided to write here, as we had no luck troubleshooting PF's
poor performance on 1GE interface.
Network scheme, given as simplest as possible is:
ISP <-> BGP ROUTER <-> PF ROUTER with many rdr rules <-> LAN
Problem is reproducible on any PF ROUTER's connection - to LAN and to BGP ROUTER
Both BGP and PF routers' OS versions and tunables, hardware:
Hardware: E3-1230 V2 with HT on, 8GB RAM, ASUS P8B-E, NICs: Intel I350 on PCIe
FreeBSD versions tested: 9.2-RELEASE amd64 with Custom kernel,
10.3-STABLE(compiled 4th Dec 2016) amd64 with Generic kernel.
Basic tunables (for 9.2-RELEASE):
net.inet.ip.forwarding=1
net.inet.ip.fastforwarding=1
kern.ipc.somaxconn=65535
net.inet.tcp.sendspace=65536
net.inet.tcp.recvspace=65536
net.inet.udp.recvspace=65536
kern.random.sys.harvest.ethernet=0
kern.random.sys.harvest.point_to_point=0
kern.random.sys.harvest.interrupt=0
kern.polling.idle_poll=1
BGP router doesn't have any firewall.
PF options of PF router are:
set state-policy floating
set limit { states 2048000, frags 2000, src-nodes 384000 }
set optimization normal
Problem description:
We are experiencing low throughput when PF is enabled with all the
rdr's. If 'skip' is set on benchmarked interface or the rdr rules are
commented (not present) - the bandwidth is flawless. In PF, there is
no scrubbing done, most of roughly 2500 rdr rules look like this,
please note that no interface is specified and it's intentional:
rdr pass inet proto tcp from any to 1.2.3.4 port 1235 -> 192.168.0.100 port 1235
All measurements were taken using iperf 2.0.5 with options "-c <IP>"
or "-c <IP> -m -t 60 -P 8" on client side and "-s" on server side. We
changed directions too.
Please note that this is a production environment and there was some
other traffic on bencharked interfaces (let's say 20-100Mbps) during
both tests, thus iperf won't show full Gigabit. There is no networking
eqipment between 'client' and 'server' - just 2 NICs independly
connected with Cat6 cable.
Without further ado, here are benchmark results:
server's PF enabled with fw rules but without rdr rules:
root@client:~ # iperf -c server
------------------------------------------------------------
Client connecting to server, TCP port 5001
TCP window size: 65.0 KByte (default)
------------------------------------------------------------
[ 3] local clients_ip port 51361 connected with server port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 1.09 GBytes 936 Mbits/sec
server's PF enabled with fw rules and around 2500 redirects present:
root@client:~ # iperf -c seerver
------------------------------------------------------------
Client connecting to server, TCP port 5001
TCP window size: 65.0 KByte (default)
------------------------------------------------------------
[ 3] local clients_ip port 45671 connected with server port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 402 MBytes 337 Mbits/sec
That much of a difference is 100% reproducible on production env.
Performance depends on hours of day&night, the result is 160-400Mbps
with RDR rules present and always above 900Mbps with RDR rules
disabled.
Some additional information:
# pfctl -s info
Status: Enabled for 267 days 10:25:22 Debug: Urgent
State Table Total Rate
current entries 132810
searches 5863318875 253.8/s
inserts 140051669 6.1/s
removals 139918859 6.1/s
Counters
match 1777051606 76.9/s
bad-offset 0 0.0/s
fragment 191 0.0/s
short 518 0.0/s
normalize 0 0.0/s
memory 0 0.0/s
bad-timestamp 0 0.0/s
congestion 0 0.0/s
ip-option 4383 0.0/s
proto-cksum 0 0.0/s
state-mismatch 52574 0.0/s
state-insert 172 0.0/s
state-limit 0 0.0/s
src-limit 0 0.0/s
synproxy 0 0.0/s
# pfctl -s states | wc -l
113705
# pfctl -s memory
states hard limit 2048000
src-nodes hard limit 384000
frags hard limit 2000
tables hard limit 1000
table-entries hard limit 200000
# pfctl -s Interfaces|wc -l
75
# pfctl -s rules | wc -l
1226
In our opinion hardware is not too weak as we have only 10-30% of CPU
usage and during the benchmark it doesn't go to 100%. Even any one
vcore isn't filled up to 100% of CPU usage.
I would be really grateful if someone could point me at the right direction.
PF uses a linear search (with some optimizations to skip over rules
which can't match) to establish new flows. If your PF config is really
that simple give IPFW a try. While PF has a lot nicer syntax IPFW
supports more powerful tables. IPFW tables are key value maps and the
value can be used as argument to most actions. It may reduce your 2500
lookups to one table lookup. If you can afford to loose the source IP
and port you could use a userspace TCP proxy.
_______________________________________________
[email protected] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-pf
To unsubscribe, send any mail to "[email protected]"