Thanks to everyone for your advice! I'll try to respond to all the
questions at once and provide some more information about the testing
that I did today.

The BIOS on these firewalls is current. For power-saving options, when
I first configured these systems I tried turning Intel EIST
(SpeedStep) off, but this caused OpenBSD to panic during boot. The
panic text is copied at the end of this message, but the keyboard
didn't work at the ddb prompt (not even Ctrl-Alt-Del), so I couldn't
run any commands. Here's what my performance-related BIOS settings
look like:

Hyper-threading: Disabled
Active Processor Cores: All
Limit CPUID Maximum: Disabled
Execute Disable Bit: Enabled
Intel Virtualization Technology: Disabled
Hardware Prefetcher: Enabled
Adjacent Cache Line Prefetch: Enabled
EIST: Enabled
Turbo Mode: Enabled
CPU C3 Report: Disabled
CPU C6 Report: Disabled
CPU C7 Report: Disabled
VT-d: Disabled

I doubt that disabling EIST would have a significant performance
advantage. Latency may suffer a bit while the CPU raises its frequency
when the traffic hits, but I don't think this would affect throughput
testing. Tomorrow, I'll try disabling other cores and using bsd.sp
kernel to see if that performs any better. Might also play with the
hardware prefetcher settings.

Today, I started testing forwarding performance with pf enabled. I put
the second firewall aside and installed the X540-T2 cards into four
identical Dell OptiPlex 9010 desktops. Two "servers" (s1 & s2) and two
"clients" (c1 & c2). Each pair was connected through a Dell
PowerConnect 8164 10GbE switch to a separate port on the firewall. The
two switches had no other connections. I installed FreeBSD 9.1-RELEASE
amd64 on the desktops.

As a side note, iperf doesn't crash on FreeBSD when running in UDP
mode, so I think it's a problem with the OpenBSD package. For these
tests I stuck with TCP and 1500 MTU. Also, I noticed that a 10 second
test is not always sufficient to get consistent results, so I'm now
running all tests for 60 seconds.

First test is iperf on 127.0.0.1 to compare these desktops with the
11.6 Gbps that I got on the firewall:

# c1: iperf -s
# c1: iperf -c 127.0.0.1 -t 60
[  3]  0.0-59.9 sec   402 GBytes  57.7 Gbits/sec

That's... a bit faster. The CPU in the desktops is Intel i7-3770,
which is very similar to the Xeon E3-1275v2. Is this a FreeBSD vs
OpenBSD difference?

Second test is c1 -> c2 via the 8164 switch (not involving the firewall yet):

# c2: iperf -s
# c1: iperf -c c2 -t 60
[  4]  0.0-60.1 sec  40.2 GBytes  5.74 Gbits/sec

A single desktop can't saturate the link, at least with the default
settings, but two on each side should be plenty to test the firewall
to its limit.

Third test is c1 -> s1 through the firewall with pf stateful filtering:

# s1: iperf -s
# c1: iperf -c s1 -t 60
[  3]  0.0-60.0 sec  30.0 GBytes  4.29 Gbits/sec

I watched systat and top on the firewall while this test was running.
16k interrupts evenly split between ix0 and ix1, and ~90% interrupt
usage on CPU0.

Fourth test is c1 -> s1 and c2 -> s2. I used a netcat server on the
firewall (nc -l 1234) to synchronize both clients. They started iperf
as soon as I killed the server with Ctrl-C:

# s1: iperf -s
# s2: iperf -s
# c1: nc gw 1234; iperf -c s1 -t 60
# c2: nc gw 1234; iperf -c s2 -t 60
[  3]  0.0-60.0 sec  14.4 GBytes  2.07 Gbits/sec
[  3]  0.0-60.0 sec  15.8 GBytes  2.26 Gbits/sec

An even split of the single client performance, indicating that the
firewall is the bottleneck. No changes in systat and top, so it does
look like the CPU is the limiting factor.

Finally, I used "set skip on {ix0, ix1}" to disable pf on these two
interfaces and re-ran the same test:

[  3]  0.0-60.0 sec  18.1 GBytes  2.59 Gbits/sec
[  3]  0.0-60.0 sec  16.3 GBytes  2.34 Gbits/sec

A small improvement, but I think it's fair to say that pf isn't the problem.

Will do some more testing tomorrow. Here's the boot panic when I
disable SpeedStep in BIOS:

acpiec0 at acpi0: Failed to read resource settings
acpicpu0 at acpi0Store to default type! 100

01a4 Called: \_PR_.CPU0._PDC
  arg0: 0xffff8000001af588 cnt:01 stk:00 buffer: 0c {01, 00, 00, 00,
01, 00, 00, 00, 3b, 03, 00, 00}
panic: aml_die aml_store:2621
Stopped at      Debugger+0x5:  leave
Debugger() at Debugger+0x5
panic() at panic+0xe4
_aml_die() at _aml_die+0x183
aml_store() at aml_store+0xbb
aml_parse() at aml_parse+0xcd7
aml_eval() at aml_eval+0x1c8
aml_evalnode() at aml_evalnode+0x63
acpicpu_set_pdc() at acpicpu_set_pdc+0x8c
acpucpu_attach() at acpicpu_attach+0x9e
config_attach() at config_attach+0x1d4
end trace frame: 0xffffffff80e6da90, count: 0

Reply via email to