I send the figure again. Now I surely used only spaces and not tabs:
*2001:2::[0000-ffff]:2/64198.19.0.0/15 - 198.19.255.254/15*
*\+--------------------------------------+/*
*IPv6\ |InitiatorResponder| /*
*+-------------|Tester|<------------+*
*| addresses|[state table]| public IPv4 |*
*|+--------------------------------------+|*
*||*
*|+--------------------------------------+|*
*| 2001:2::1/64|DUT:| public IPv4 |*
*+------------>|Stateful NAT64 gateway|-------------+*
*IPv6 address |[connection tracking table]| \*
*+--------------------------------------+\*
*198.18.0.1/15*
Gábor
9/3/2023 8:45 PM keltezéssel, Gabor LENCSE írta:
Dear List Members,
I have a weird problem, when I try to ssh to an OpenBSD server. (I use
OpenBSD 7.3 with GENERIC.MP #1125 kernel.)
I perform benchmarking tests to measure the performance of OpenBSD
PF. I use the below test setup:
2001:2::[0000-ffff]:2/64 198.19.0.0/15 - 198.19.255.254/15
\ +--------------------------------------+ /
IPv6 \ |Initiator Responder| /
+-------------| Tester |<------------+
| addresses | [state table]| public IPv4 |
| +--------------------------------------+ |
| |
| +--------------------------------------+ |
| 2001:2::1/64| DUT: | public IPv4 |
+------------>| Stateful NAT64 gateway |-------------+
IPv6 address | [connection tracking table] | \
+--------------------------------------+ \
198.18.0.1/15
(As for the actual tests, I use only sub-ranges from the potential IP
address ranges shown above.)
The Tester is executed on a Linux server. During my tests, a bash
shell script (running on the Linux server) executes various commands
on the DUT (Device Under Test), which is the OpenBSD server. To that
end, I use ssh with key based authentication. Usually everything goes
well, but after a while, things "go wrong", and I cannot ssh from the
Linux server to the OpenBSD server any more. I get the following error
message:
root@tester:~/siitperf# ssh 172.16.17.102
ssh_exchange_identification: read: Connection reset by peer
root@tester:~/siitperf#
Then I even cannot ssh from the OpenBSD server to itself:
dut# ssh localhost
getsockname failed: Connection reset by peer
banner exchange: Connection to 127.0.0.1 port -1: Broken pipe
dut# ssh 172.16.17.102
getsockname failed: Connection reset by peer
banner exchange: Connection to UNKNOWN port -1: Broken pipe
dut#
To be able to perform the tests, I set various things by my scripts,
and perhaps one of them could be the culprit, but I cannot find it. I
execute the scripts in the /root/DUT-settings directory of the OpenBSD
server from the bash shell script running on the tester using ssh. The
relevant scripts are:
dut# pwd
/root/DUT-settings
dut# cat set-nat64-varip # this one sets static NDP and ARP entries
/root/DUT-settings/set-ndm-left 0 3999
/root/DUT-settings/set-arpm-right 2 1001
dut# cat set-ndm-left
for i in $(seq $1 $2)
do
h=$(printf "%x" $i)
ndp -s 2001:2::$h:2 24:6e:96:3c:3f:40 permanent
done
dut# cat set-ndm-right
for i in $(seq $1 $2)
do
h=$(printf "%x" $i)
ndp -s 2001:2:0:8000::$h:2 24:6e:96:3c:3f:42 permanent
done
dut# cat set-pf
pfctl -f /etc/pf-set-nat64
dut# cat /etc/pf-set-nat64
# $OpenBSD: pf.conf,v 1.55 2017/12/03 20:40:04 sthen Exp $
#
# See pf.conf(5) and /etc/examples/pf.conf
set skip on lo
block return # block stateless traffic
pass # establish keep-state
# By default, do not permit remote connections to X11
block return in on ! lo0 proto tcp to port 6000:6010
# Port build user does not need network
block return out log proto {tcp udp} user _pbuild
set skip on em1 # to protect ssh
set limit states 1000000000 # 1000M
set timeout interval 3600 # 1 hour
pass in on ix0 inet6 from any to 64:ff9b::/96 af-to inet from 198.19.0.1
dut#
When everything is set, then the test follows. I have two kinds of tests.
1) Maximum connection establishment rate test. It sends 4M test frames
with all different source IP address and destination IP address
combinations to establish 4M connections. The test uses a binary
search to find the highest rate at which all connections are
established. (In fact it is not checked. What is checked, is that all
test frames arrive back the the Tester.)
2) Throughput test. First, the 4M connections are loaded into the
connection tracking table of PF. Then comes the throughput test with
bidirectional traffic. One elementary test last for 60s. A binary
search is used to find the highest rate at which all frames are
forwarded.
In the case of both tests, I reboot the DUT after each elementary step
of the binary search. Its aim is to completely clear the connection
tracking table of PF. And, IMHO, it should put the OpenBSD server into
a well defined, clear state. After which, it should behave the in the
same way, every time.
And now come the weird things. The maximum connection establishment
rate test was successful. The binary search was executed 10 times
without any problem. As for the throughput test, the binary search was
done ones fully. (It means 9 steps.)
Here is the first result:
No, Size, Dir, n, m, Duration, Initial Rate, N, M, R, T, D, Error,
Date, Iterations needed, rate
1, 84, b, 2, 2, 60, 200000, 4000000, 4000000, 80000, 500, 51000, 1000,
2023-09-03 18:23:27, 9, 361718
root@tester:~/siitperf#
And when the binary search was executed the second time, it stopped
working after the second iteration. This is the relevant part from the
nohup.out file:
Preliminary frames received: 4000000
Info: Preliminary phase finished.
Info: Testing initiated at 2023-09-03 18:31:35
Info: Forward sender's sending took 59.9999967862 seconds.
Forward frames sent: 18000000
Info: Reverse sender's sending took 59.9999967887 seconds.
Reverse frames sent: 18000000
Forward frames received: 17726784
Reverse frames received: 17782805
Info: Test finished.
Rebooting the DUT and then waiting for 240 seconds...
ssh_exchange_identification: read: Connection reset by peer^M
Done.
The script waited for 240s and then continued the work, but from this
point it could never ssh to the DUT again, and thus all its further
results are rubbish...
Some more information: I could execute the throughput test without any
problem, when I used only a single IP address pair and 4M different
port number combinations. This makes me think that perhaps the usage
of the high number of IP addresses (4000 static NDP and 1000 static
ARP entries are set) could cause the problem? But I reboot the system
after every single step. Why it does not have a clear state then?
Could there be some random event?
Did I make a mistake in pf.conf? -- I am not familiar with PF, so it
has a chance, too!
Could you please advise me?
Thank you very much in advance!
Best regards,
Gábor
p.s.: Although I do not suspect any hardware problem, I have attached
the dmesg of the DUT.