On same hardware, a week ago I changed my router from a different operating system to NetBSD/amd64 9.2.
It is running a simple NAT gateway using NPF and also runs dhcpd and unbound for internal LAN. Periodically my shells on this new NetBSD router become unusable -- too slow to type. The interfaces are: re0 is my WAN re0 at pci2 dev 0 function 0: RealTek 8168/8111 PCIe Gigabit Ethernet (rev. 0x03) re0: interrupting at msix1 vec 0 re0: using 256 tx descriptors rgephy0 at re0 phy 7: RTL8211B 1000BASE-T media interface re1 is my LAN re1 at pci3 dev 1 function 0: RealTek 8169/8110 Gigabit Ethernet (rev. 0x10) re1: interrupting at ioapic0 pin 16 re1: using 256 tx descriptors rgephy1 at re1 phy 7: RTL8211C 1000BASE-T media interface I can reproduce the problem by starting an rsync (over ssh) within my LAN transferring to or from outside. I can also reproduce by running "speedtest-cli" within my LAN. I cannot reproduce the problem by doing the rsync or speedtest-cli directly on the NetBSD router itself. So it appears not be the NAT nor the WAN interface. While my NetBSD router shell is unusable, I can still use remote SSH shells fine. That is the part that confuses me, so over the NAT and over the WAN is okay. Even ssh shell on the remote host rsyncing to or from is usable while the NetBSD gateway shell is unusable (at the same time). There is low cpu load when I have problem. With rsync across my gateway, if I use --bwlimit 1400k, the problem is noticable but shell is somewhat usable. --bwlimit 1500k or faster then shell is unusable. I tried to watch with sysstat ifstat. It appears to hang when re1 out (to my LAN) reaches around 10 Mbits/s to 11 Mbits/s. One time the "systat ifstat 0.01" showed it hanged at out 10.883 Mb/s , peak: 12.196 Mb/s. (But since it hangs, it may not have updated timely.) The shell hangs immediately when doing the rsync. When I suspend the rsync, my shell recovers in about 10 seconds. I could reproduce this many times. speedtest-cli over LAN shows Download: 6.34 Mbit/s systat ifstat 0.01 shows peak 24.312 Mb/s another speedtest-cli run over LAN Download: 9.95 Mbit/s systat peak 20.981 Mb/s A speedtest-cli over the LAN using same hardware, same interfaces, different operating system was Download: 62.72 Mbit/s but that was six months ago, and different target "best server". I can also get 18.816 Mb/s traffic from the gateway (not over NAT nor WAN) to LAN and the NetBSD gateway shell is still usuable, but noticably laggy. So 1.5 times more bandwidth. So maybe it is the NPF NAT that is the problem. My npf.conf is: $ext_if = "re0" $int_if = "re1" $ext_addrs = { ifaddrs($ext_if) } $localnet = { 172.16.1.0/24 } alg "icmp" map inet4($ext_if) dynamic $localnet -> inet4($ext_if) group "external" on $ext_if { pass stateful out all block in all } group "internal" on $int_if { pass final all } group default { pass final on lo0 all block all } I am unsure if the NPF is the problem, and maybe my interface has a problem, but it was working fine for me to login and use the shell on the system locally fine many times before I put NetBSD on it. Any suggestions on tuning so my shell on the router is usable? Here is "sysstat vmstat 0.01" when it hangs: 4 users Load 0.12 0.05 0.05 Sat Mar 26 18:31:58 Proc:r d s Csw Traps SysCal Intr Soft Fault PAGING SWAPPING 1 6 114 1193 1200 1000 in out in out ops 14.3% Sy 0.0% Us 0.0% Ni 3.6% In 82.1% Id pages | | | | | | | | | | | =======%% forks fkppw Anon 130180 4% zero 302356 1250 Interrupts fksvm Exec 24816 % wired 24 TLB shootdown pwait File 1831888 61% inact 671384 100 cpu0 timer relck Meta 409088 % bufs 89448 336 ioapic0 pin 16 rlkok (kB) real swaponly free ioapic0 pin 18 noram Active 1315476 331500 814 msix1 vec 0 ndcpy Namei Sys-cache Proc-cache ioapic0 pin 23 fltcp Calls hits % hits % ioapic0 pin 19 zfod 6 6 100 cow 512 fmin Disks: sd0 wd0 dk0 dk1 682 ftarg seeks itarg xfers flnan bytes pdfre %busy Any suggestions on how I can better diagnose this?