I am sorry.

A somewhat different but at least visible version of the Test setup is available here: https://datatracker.ietf.org/doc/html/draft-ietf-bmwg-benchmarking-stateful#test_setup_sfnat64_multi

Gábor

9/3/2023 8:45 PM keltezéssel, Gabor LENCSE írta:
Dear List Members,

I have a weird problem, when I try to ssh to an OpenBSD server. (I use OpenBSD 7.3 with GENERIC.MP #1125 kernel.)

I perform benchmarking tests to measure the performance of OpenBSD PF.  I use the below test setup:

2001:2::[0000-ffff]:2/64 198.19.0.0/15 - 198.19.255.254/15
           \  +--------------------------------------+  /
  IPv6      \ |Initiator                    Responder| /
+-------------|                Tester |<------------+
| addresses   |                         [state table]| public IPv4 |
| +--------------------------------------+             |
| |
| +--------------------------------------+             |
| 2001:2::1/64|                 DUT:                 | public IPv4 |
+------------>|        Stateful NAT64 gateway |-------------+
 IPv6 address |     [connection tracking table]      | \
              +--------------------------------------+  \
198.18.0.1/15

(As for the actual tests, I use only sub-ranges from the potential IP address ranges shown above.)

The Tester is executed on a Linux server. During my tests, a bash shell script (running on the Linux server) executes various commands on the DUT (Device Under Test), which is the OpenBSD server. To that end, I use ssh with key based authentication. Usually everything goes well, but after a while, things "go wrong", and I cannot ssh from the Linux server to the OpenBSD server any more. I get the following error message:

root@tester:~/siitperf# ssh 172.16.17.102
ssh_exchange_identification: read: Connection reset by peer
root@tester:~/siitperf#

Then I even cannot ssh from the OpenBSD server to itself:

dut# ssh localhost
getsockname failed: Connection reset by peer
banner exchange: Connection to 127.0.0.1 port -1: Broken pipe
dut# ssh 172.16.17.102
getsockname failed: Connection reset by peer
banner exchange: Connection to UNKNOWN port -1: Broken pipe
dut#

To be able to perform the tests, I set various things by my scripts, and perhaps one of them could be the culprit, but I cannot find it. I execute the scripts in the /root/DUT-settings directory of the OpenBSD server from the bash shell script running on the tester using ssh. The relevant scripts are:

dut# pwd
/root/DUT-settings

dut# cat set-nat64-varip # this one sets static NDP and ARP entries
/root/DUT-settings/set-ndm-left 0 3999
/root/DUT-settings/set-arpm-right 2 1001

dut# cat set-ndm-left
for i in $(seq $1 $2)
do
  h=$(printf "%x" $i)
  ndp -s 2001:2::$h:2 24:6e:96:3c:3f:40 permanent
done

dut# cat set-ndm-right
for i in $(seq $1 $2)
do
  h=$(printf "%x" $i)
  ndp -s 2001:2:0:8000::$h:2 24:6e:96:3c:3f:42 permanent
done

dut# cat set-pf
pfctl -f /etc/pf-set-nat64

dut# cat /etc/pf-set-nat64
#       $OpenBSD: pf.conf,v 1.55 2017/12/03 20:40:04 sthen Exp $
#
# See pf.conf(5) and /etc/examples/pf.conf

set skip on lo

block return    # block stateless traffic
pass            # establish keep-state

# By default, do not permit remote connections to X11
block return in on ! lo0 proto tcp to port 6000:6010

# Port build user does not need network
block return out log proto {tcp udp} user _pbuild

set skip on em1 # to protect ssh
set limit states 1000000000 # 1000M
set timeout interval 3600 # 1 hour
pass in on ix0 inet6 from any to 64:ff9b::/96 af-to inet from 198.19.0.1

dut#

When everything is set, then the test follows. I have two kinds of tests.

1) Maximum connection establishment rate test. It sends 4M test frames with all different source IP address and destination IP address combinations to establish 4M connections. The test uses a binary search to find the highest rate at which all connections are established. (In fact it is not checked. What is checked, is that all test frames arrive back the the Tester.)

2) Throughput test. First, the 4M connections are loaded into the connection tracking table of PF. Then comes the throughput test with bidirectional traffic. One elementary test last for 60s. A binary search is used to find the highest rate at which all frames are forwarded.

In the case of both tests, I reboot the DUT after each elementary step of the binary search. Its aim is to completely clear the connection tracking table of PF. And, IMHO, it should put the OpenBSD server into a well defined, clear state. After which, it should behave the in the same way, every time.

And now come the weird things. The maximum connection establishment rate test was successful. The binary search was executed 10 times without any problem. As for the throughput test, the binary search was done ones fully. (It means 9 steps.)

Here is the first result:

No, Size, Dir, n, m, Duration, Initial Rate, N, M, R, T, D, Error, Date, Iterations needed, rate 1, 84, b, 2, 2, 60, 200000, 4000000, 4000000, 80000, 500, 51000, 1000, 2023-09-03 18:23:27, 9, 361718
root@tester:~/siitperf#

And when the binary search was executed the second time, it stopped working after the second iteration. This is the relevant part from the nohup.out file:

Preliminary frames received: 4000000
Info: Preliminary phase finished.
Info: Testing initiated at 2023-09-03 18:31:35
Info: Forward sender's sending took 59.9999967862 seconds.
Forward frames sent: 18000000
Info: Reverse sender's sending took 59.9999967887 seconds.
Reverse frames sent: 18000000
Forward frames received: 17726784
Reverse frames received: 17782805
Info: Test finished.
Rebooting the DUT and then waiting for 240 seconds...
ssh_exchange_identification: read: Connection reset by peer^M
Done.

The script waited for 240s and then continued the work, but from this point it could never ssh to the DUT again, and thus all its further results are rubbish...

Some more information: I could execute the throughput test without any problem, when I used only a single IP address pair and 4M different port number combinations. This makes me think that perhaps the usage of the high number of IP addresses (4000 static NDP and 1000 static ARP entries are set) could cause the problem? But I reboot the system after every single step. Why it does not have a clear state then?

Could there be some random event?

Did I make a mistake in pf.conf? -- I am not familiar with PF, so it has a chance, too!

Could you please advise me?

Thank you very much in advance!

Best regards,

Gábor

p.s.: Although I do not suspect any hardware problem, I have attached the dmesg of the DUT.


Reply via email to