NPF/interface tuning? shell unusable on gateway

Jeremy C. Reed Sat, 26 Mar 2022 11:40:37 -0700

On same hardware, a week ago I changed my router from a different 
operating system to NetBSD/amd64 9.2.


It is running a simple NAT gateway using NPF and also runs dhcpd and 
unbound for internal LAN.

Periodically my shells on this new NetBSD router become unusable -- too 
slow to type.

The interfaces are:

re0 is my WAN
re0 at pci2 dev 0 function 0: RealTek 8168/8111 PCIe Gigabit Ethernet 
(rev. 0x03)
re0: interrupting at msix1 vec 0
re0: using 256 tx descriptors
rgephy0 at re0 phy 7: RTL8211B 1000BASE-T media interface

re1 is my LAN
re1 at pci3 dev 1 function 0: RealTek 8169/8110 Gigabit Ethernet (rev. 
0x10)
re1: interrupting at ioapic0 pin 16
re1: using 256 tx descriptors
rgephy1 at re1 phy 7: RTL8211C 1000BASE-T media interface

I can reproduce the problem by starting an rsync (over ssh) within my 
LAN transferring to or from outside. I can also reproduce by running 
"speedtest-cli" within my LAN.

I cannot reproduce the problem by doing the rsync or speedtest-cli 
directly on the NetBSD router itself. So it appears not be the NAT nor 
the WAN interface.

While my NetBSD router shell is unusable, I can still use remote SSH 
shells fine.  That is the part that confuses me, so over the NAT and 
over the WAN is okay. Even ssh shell on the remote host rsyncing to or 
from is usable while the NetBSD gateway shell is unusable (at the same 
time).

There is low cpu load when I have problem.

With rsync across my gateway, if I use --bwlimit 1400k, the problem is 
noticable but shell is somewhat usable. --bwlimit 1500k or faster then 
shell is unusable.

I tried to watch with sysstat ifstat. It appears to hang when re1 out 
(to my LAN) reaches around 10 Mbits/s to 11 Mbits/s. One time the 
"systat ifstat 0.01" showed it hanged at out 10.883 Mb/s , peak:  
12.196 Mb/s. (But since it hangs, it may not have updated timely.)

The shell hangs immediately when doing the rsync. When I suspend the 
rsync, my shell recovers in about 10 seconds. I could reproduce this 
many times.

speedtest-cli over LAN shows Download: 6.34 Mbit/s
systat ifstat 0.01 shows peak 24.312 Mb/s

another speedtest-cli run over LAN Download: 9.95 Mbit/s
systat peak 20.981 Mb/s

A speedtest-cli over the LAN using same hardware, same interfaces, 
different operating system was Download: 62.72 Mbit/s but that was six 
months ago, and different target "best server".

I can also get 18.816 Mb/s traffic from the gateway (not over NAT nor 
WAN) to LAN and the NetBSD gateway shell is still usuable, but noticably 
laggy. So 1.5 times more bandwidth. So maybe it is the NPF NAT that is 
the problem.

My npf.conf is:

$ext_if = "re0"
$int_if = "re1"
$ext_addrs = { ifaddrs($ext_if) }
$localnet = { 172.16.1.0/24 }

alg "icmp"

map inet4($ext_if) dynamic $localnet -> inet4($ext_if)

group "external" on $ext_if {
    pass stateful out all
    block in all
}

group "internal" on $int_if {
    pass final all
}

group default {
    pass final on lo0 all
    block all
}

I am unsure if the NPF is the problem, and maybe my interface has a 
problem, but it was working fine for me to login and use the shell on 
the system locally fine many times before I put NetBSD on it.

Any suggestions on tuning so my shell on the router is usable?

Here is "sysstat vmstat 0.01" when it hangs:

    4 users    Load  0.12  0.05  0.05                  Sat Mar 26 18:31:58

Proc:r  d  s        Csw  Traps SysCal  Intr   Soft  Fault     PAGING   SWAPPING
        1  6        114          1193  1200   1000            in  out   in  out
                                                        ops
  14.3% Sy   0.0% Us   0.0% Ni   3.6% In  82.1% Id    pages
|    |    |    |    |    |    |    |    |    |    |
=======%%                                                                 forks
                                                                          fkppw
Anon       130180   4%   zero   302356      1250 Interrupts               fksvm
Exec        24816    %   wired      24           TLB shootdown            pwait
File      1831888  61%   inact  671384       100 cpu0 timer               relck
Meta       409088    %   bufs    89448       336 ioapic0 pin 16           rlkok
 (kB)        real   swaponly      free           ioapic0 pin 18           noram
Active    1315476               331500       814 msix1 vec 0              ndcpy
Namei         Sys-cache     Proc-cache           ioapic0 pin 23           fltcp
    Calls     hits    %     hits     %           ioapic0 pin 19           zfod
        6        6  100                                                   cow
                                                                      512 fmin
  Disks:     sd0     wd0     dk0     dk1                              682 ftarg
 seeks                                                                    itarg
 xfers                                                                    flnan
 bytes                                                                    pdfre
 %busy

Any suggestions on how I can better diagnose this?

NPF/interface tuning? shell unusable on gateway

Reply via email to