Thanks Alexandr and Stuart for your replies. I categorized this as a bug in my head, because of (umb0) vs umb0 usage in my pf rules:
pce-0035# diff -u pf.conf-t1 pf.conf --- pf.conf-t1 Wed May 19 07:45:27 2021 +++ pf.conf Thu Mar 4 19:34:25 2021 @@ -6,7 +6,7 @@ queue q_umb0 on umb0 flows 1024 qlimit 50 quantum 300 default queue q_athn0 on athn0 flows 1024 qlimit 100 quantum 300 default -match out on umb0 inet from !umb0 nat-to umb0:0 +match out on umb0 inet from !(umb0) nat-to (umb0:0) match on umb0 inet all scrub (no-df random-id max-mss 1460) block return Then via pfctl -vnf pf.conf... pce-0035# diff -u pfctl-vnf-t1 pfctl-vnf --- pfctl-vv-sn-f-t1 Wed May 19 07:45:54 2021 +++ pfctl-vv-sn-f Wed May 19 07:46:03 2021 @@ -2,7 +2,7 @@ set limit states 25000 queue q_umb0 on umb0 flows 1024 quantum 300 default qlimit 50 queue q_athn0 on athn0 flows 1024 quantum 300 default qlimit 100 -match out on umb0 inet from ! 100.125.241.79 to any nat-to 100.125.241.79 +match out on umb0 inet from ! (umb0) to any nat-to (umb0:0) match on umb0 inet all scrub (no-df random-id max-mss 1460) block return all pass quick proto icmp all On Wed, May 19, 2021 at 07:57:03AM +0100, Stuart Henderson wrote: > This can happen with any long lived UDP-based protocol that is natted to a > dynamically configured address. I think this is not a bug; PF is doing > exactly what you have asked it to. > > The issue is that you're sending packets frequently (due to the fairly low > keepalive timer) which refreshes the PF state and keeping the NAT mapping > alive. Normally this is what you want, it's the whole point of the > keepalive, but it falls apart when the address changes. > > As you've said, flushing all states does the trick (and it's obvious why); > this could possibly be automated via ifstated. It's a bit heavy handed to > flush all states when only those involving one natted IP address actually > need it but pf doesn't have a more targetted way for states using a certain > NAT address, short of parsing pfctl -ss and killing the individual states by > id. Alternatively, not a generic solution but might work in your case is to > use ifstated to run pfctl -k to delete states from the known address of the > rpi to the known wireguard endpoint address. I'm hoping that the change of > address would also involve a link state change so that ifstated can trigger > on this. However the interface used by pfctl -k is buggy and I can't > remember if this use will run into a problem or not... > > I don't know if it might make sense to handle automatically in the kernel; > it would be convenient for the user but would I think be delicate work, > especially regarding MP locking. > > -- > Sent from a phone, apologies for poor formatting. > On 19 May 2021 01:16:06 Mikolaj Kucharski <[email protected]> wrote: > > > Forgot to also show PF rules: > > > > pce-0035# grep -ve '^$' -e '^#' /etc/pf.conf > > set skip on lo > > set limit states 25000 > > queue q_umb0 on umb0 flows 1024 qlimit 50 quantum 300 default > > queue q_athn0 on athn0 flows 1024 qlimit 100 quantum 300 default > > match out on umb0 inet from !(umb0) nat-to (umb0:0) > > match on umb0 inet all scrub (no-df random-id max-mss 1460) > > block return > > pass quick proto icmp all > > pass quick proto icmp6 all > > pass quick on tun0 from (tun0:network) to any keep state (if-bound) > > pass in quick proto tcp from any to (self) port ssh > > pass in quick proto udp from any to (self) port 51820 > > pass quick on { athn0 em0 em1 em2 } > > pass out > > > > > > On Wed, May 19, 2021 at 12:10:45AM +0000, [email protected] wrote: > > > > Synopsis: wireguard traffic blackholed after umb(4) changes ip addr > > > > Category: kernel > > > > Environment: > > > System : OpenBSD 6.9 > > > Details : OpenBSD 6.9-current (GENERIC.MP) #14: Tue May 11 > > > 18:41:12 UTC 2021 > > > [email protected]:/home/mkucharski/openbsd/src/sys/arch/amd64/compile/GENERIC.MP > > > > > > Architecture: OpenBSD.amd64 > > > Machine : amd64 > > > > Description: > > > This also occurs on vanilla kernel, but at present I'm running > > > custom kernel, with some athn(4) related changes. > > > > > > Once in a while umb(4) disconnects from network and reconnects with new > > > IP address. OpenBSD machine, from this bug report has a wg0 interface > > > and when umb(4) changes IP address wireguard tunnel from OpenBSD to > > > external peer klMOiaGJpjMM1bqJouUirOIJRRqcQ8J5QdWOErfj5UM= is NOT > > > affected: > > > > > > pce-0035# ifconfig wg0 > > > wg0: flags=80c3<UP,BROADCAST,RUNNING,NOARP,MULTICAST> mtu 1420 > > > index 9 priority 0 llprio 3 > > > wgport 51820 > > > wgpubkey BvWfmzqI94CkkI5TygWcmT10de8+7DUA2cxsl3jPeyo= > > > wgpeer klMOiaGJpjMM1bqJouUirOIJRRqcQ8J5QdWOErfj5UM= > > > wgpsk (present) > > > wgpka 25 (sec) > > > wgendpoint 5.135.165.132 51820 > > > tx: 19396164, rx: 2960456 > > > last handshake: 57 seconds ago > > > wgaip fde4:f456:48c2:13c0::/64 > > > groups: wg > > > inet6 fde4:f456:48c2:13c0::cc35 prefixlen 64 > > > > > > Per above output last handshake is pretty recent. However, over em(1) > > > device there is RPi connected with Linux, which also has wireguard > > > tunnel configured to the same endpoint: > > > > > > rpi-0058:~# wg > > > interface: wg0 > > > public key: QLG8RSrYJ/MUmIo2NJcgwleAnPFnl843HwNDgcd9u0c= > > > private key: (hidden) > > > listening port: 51820 > > > > > > peer: klMOiaGJpjMM1bqJouUirOIJRRqcQ8J5QdWOErfj5UM= > > > preshared key: (hidden) > > > endpoint: 5.135.165.132:51820 > > > allowed ips: fde4:f456:48c2:13c0::/64 > > > latest handshake: 1 day, 29 minutes, 14 seconds ago > > > transfer: 424.79 KiB received, 3.09 MiB sent > > > persistent keepalive: every 25 seconds > > > > > > Per above output, we see that latest handshake was more than a day ago. > > > When I look tcpdump on em(4) which is connected to RPi I see following: > > > > > > pce-0035# tcpdump -c5 -ni em1 host 5.135.165.132 and port 51820 > > > tcpdump: listening on em1, link-type EN10MB > > > 23:22:03.803721 192.168.1.58.51820 > 5.135.165.132.51820: [wg] > > > initiation from 0xec39d550 [tos 0x88] > > > 23:22:08.923739 192.168.1.58.51820 > 5.135.165.132.51820: [wg] > > > initiation from 0x13821437 [tos 0x88] > > > 23:22:14.043830 192.168.1.58.51820 > 5.135.165.132.51820: [wg] > > > initiation from 0x3fda1931 [tos 0x88] > > > 23:22:19.803752 192.168.1.58.51820 > 5.135.165.132.51820: [wg] > > > initiation from 0x1537a4da [tos 0x88] > > > 23:22:25.083788 192.168.1.58.51820 > 5.135.165.132.51820: [wg] > > > initiation from 0x93343c5f [tos 0x88] > > > > > > Then when I look at tcpdmump on umb(4) I see traffic from local machine, > > > which works fine, but also we see above initiation traffic from RPi, > > > which uses wrong (old) IP address (TWO wrong IP addresses): > > > > > > pce-0035# tcpdump -c50 -ni umb0 host 5.135.165.132 and port 51820 > > > tcpdump: listening on umb0, link-type LOOP > > > 23:23:30.355499 100.125.241.79.51820 > 5.135.165.132.51820: [wg] > > > data length 160 to 0x12b4877c nonce 6 > > > 23:23:30.355695 100.125.241.79.51820 > 5.135.165.132.51820: [wg] > > > data length 160 to 0x12b4877c nonce 7 > > > 23:23:30.421959 5.135.165.132.51820 > 100.125.241.79.51820: [wg] > > > data length 80 to 0xb8f89c82 nonce 7 > > > 23:23:30.614535 5.135.165.132.51820 > 100.125.241.79.51820: [wg] > > > data length 80 to 0xb8f89c82 nonce 8 > > > 23:23:31.360393 100.125.241.79.51820 > 5.135.165.132.51820: [wg] > > > data length 480 to 0x12b4877c nonce 8 > > > 23:23:31.360398 100.125.241.79.51820 > 5.135.165.132.51820: [wg] > > > data length 256 to 0x12b4877c nonce 9 > > > 23:23:31.431873 5.135.165.132.51820 > 100.125.241.79.51820: [wg] > > > data length 80 to 0xb8f89c82 nonce 9 > > > 23:23:32.369725 100.125.241.79.51820 > 5.135.165.132.51820: [wg] > > > data length 480 to 0x12b4877c nonce 10 > > > 23:23:32.683202 100.102.59.145.52422 > 5.135.165.132.51820: [wg] > > > initiation from 0x3007810a [tos 0x88] > > > 23:23:32.744484 5.135.165.132.51820 > 100.125.241.79.51820: [wg] > > > data length 80 to 0xb8f89c82 nonce 10 > > > 23:23:33.379929 100.125.241.79.51820 > 5.135.165.132.51820: [wg] > > > data length 256 to 0x12b4877c nonce 11 > > > 23:23:33.380107 100.125.241.79.51820 > 5.135.165.132.51820: [wg] > > > data length 368 to 0x12b4877c nonce 12 > > > 23:23:33.518203 5.135.165.132.51820 > 100.125.241.79.51820: [wg] > > > data length 80 to 0xb8f89c82 nonce 11 > > > 23:23:34.389827 100.125.241.79.51820 > 5.135.165.132.51820: [wg] > > > data length 480 to 0x12b4877c nonce 13 > > > 23:23:34.753559 5.135.165.132.51820 > 100.125.241.79.51820: [wg] > > > data length 80 to 0xb8f89c82 nonce 12 > > > 23:23:35.399748 100.125.241.79.51820 > 5.135.165.132.51820: [wg] > > > data length 368 to 0x12b4877c nonce 14 > > > 23:23:35.774192 5.135.165.132.51820 > 100.125.241.79.51820: [wg] > > > data length 80 to 0xb8f89c82 nonce 13 > > > 23:23:35.903086 100.100.10.26.55426 > 5.135.165.132.51820: [wg] > > > initiation from 0xa30f0817 [tos 0x88] > > > 23:23:36.409833 100.125.241.79.51820 > 5.135.165.132.51820: [wg] > > > data length 368 to 0x12b4877c nonce 15 > > > 23:23:36.409838 100.125.241.79.51820 > 5.135.165.132.51820: [wg] > > > data length 256 to 0x12b4877c nonce 16 > > > 23:23:36.550602 5.135.165.132.51820 > 100.125.241.79.51820: [wg] > > > data length 80 to 0xb8f89c82 nonce 14 > > > 23:23:37.419936 100.125.241.79.51820 > 5.135.165.132.51820: [wg] > > > data length 480 to 0x12b4877c nonce 17 > > > 23:23:37.712890 5.135.165.132.51820 > 100.125.241.79.51820: [wg] > > > data length 80 to 0xb8f89c82 nonce 15 > > > 23:23:37.882966 100.102.59.145.52422 > 5.135.165.132.51820: [wg] > > > initiation from 0x293df129 [tos 0x88] > > > 23:23:38.429823 100.125.241.79.51820 > 5.135.165.132.51820: [wg] > > > data length 480 to 0x12b4877c nonce 18 > > > 23:23:38.834033 5.135.165.132.51820 > 100.125.241.79.51820: [wg] > > > data length 80 to 0xb8f89c82 nonce 16 > > > 23:23:39.439863 100.125.241.79.51820 > 5.135.165.132.51820: [wg] > > > data length 368 to 0x12b4877c nonce 19 > > > 23:23:39.801909 5.135.165.132.51820 > 100.125.241.79.51820: [wg] > > > data length 80 to 0xb8f89c82 nonce 17 > > > 23:23:40.449874 100.125.241.79.51820 > 5.135.165.132.51820: [wg] > > > data length 368 to 0x12b4877c nonce 20 > > > 23:23:40.833360 5.135.165.132.51820 > 100.125.241.79.51820: [wg] > > > data length 80 to 0xb8f89c82 nonce 18 > > > 23:23:41.279245 100.100.10.26.55426 > 5.135.165.132.51820: [wg] > > > initiation from 0xe9cf88d8 [tos 0x88] > > > 23:23:41.459919 100.125.241.79.51820 > 5.135.165.132.51820: [wg] > > > data length 480 to 0x12b4877c nonce 21 > > > 23:23:41.794501 5.135.165.132.51820 > 100.125.241.79.51820: [wg] > > > data length 80 to 0xb8f89c82 nonce 19 > > > 23:23:42.469957 100.125.241.79.51820 > 5.135.165.132.51820: [wg] > > > data length 368 to 0x12b4877c nonce 22 > > > 23:23:42.833630 5.135.165.132.51820 > 100.125.241.79.51820: [wg] > > > data length 80 to 0xb8f89c82 nonce 20 > > > 23:23:43.082901 100.102.59.145.52422 > 5.135.165.132.51820: [wg] > > > initiation from 0x41dfd760 [tos 0x88] > > > 23:23:43.479931 100.125.241.79.51820 > 5.135.165.132.51820: [wg] > > > data length 480 to 0x12b4877c nonce 23 > > > 23:23:43.873761 5.135.165.132.51820 > 100.125.241.79.51820: [wg] > > > data length 80 to 0xb8f89c82 nonce 21 > > > 23:23:44.489891 100.125.241.79.51820 > 5.135.165.132.51820: [wg] > > > data length 368 to 0x12b4877c nonce 24 > > > 23:23:44.833901 5.135.165.132.51820 > 100.125.241.79.51820: [wg] > > > data length 80 to 0xb8f89c82 nonce 22 > > > 23:23:45.500016 100.125.241.79.51820 > 5.135.165.132.51820: [wg] > > > data length 368 to 0x12b4877c nonce 25 > > > 23:23:45.874031 5.135.165.132.51820 > 100.125.241.79.51820: [wg] > > > data length 80 to 0xb8f89c82 nonce 23 > > > 23:23:46.398752 100.100.10.26.55426 > 5.135.165.132.51820: [wg] > > > initiation from 0x2aaeec29 [tos 0x88] > > > 23:23:46.510138 100.125.241.79.51820 > 5.135.165.132.51820: [wg] > > > data length 480 to 0x12b4877c nonce 26 > > > 23:23:46.782331 5.135.165.132.51820 > 100.125.241.79.51820: [wg] > > > data length 80 to 0xb8f89c82 nonce 24 > > > 23:23:47.520027 100.125.241.79.51820 > 5.135.165.132.51820: [wg] > > > data length 368 to 0x12b4877c nonce 27 > > > 23:23:47.794006 5.135.165.132.51820 > 100.125.241.79.51820: [wg] > > > data length 80 to 0xb8f89c82 nonce 25 > > > 23:23:48.530003 100.125.241.79.51820 > 5.135.165.132.51820: [wg] > > > data length 368 to 0x12b4877c nonce 28 > > > 23:23:48.762716 100.102.59.145.52422 > 5.135.165.132.51820: [wg] > > > initiation from 0xa0dadeeb [tos 0x88] > > > 23:23:48.904684 5.135.165.132.51820 > 100.125.241.79.51820: [wg] > > > data length 80 to 0xb8f89c82 nonce 26 > > > > > > Above grep for `initiation from`. > > > > > > See IP addr 100.125.241.79 (working) versus 100.100.10.26 (not working), > > > 100.102.59.145 (another not working) and states in PF: > > > > > > pce-0035# pfctl -ss | grep -F 5.135.165.132:51820 > > > all udp 5.135.165.132:51820 <- 192.168.1.58:51820 MULTIPLE:MULTIPLE > > > all udp 100.102.59.145:52422 (192.168.1.58:51820) -> > > > 5.135.165.132:51820 MULTIPLE:MULTIPLE > > > all udp 5.135.165.132:51820 <- 192.168.5.189:51820 MULTIPLE:MULTIPLE > > > all udp 100.100.10.26:55426 (192.168.5.189:51820) -> > > > 5.135.165.132:51820 MULTIPLE:MULTIPLE > > > all udp 100.125.241.79:51820 -> 5.135.165.132:51820 > > > MULTIPLE:MULTIPLE > > > > > > Current IP address on umb(4) is 100.125.241.79: > > > > > > pce-0035# ifconfig umb0 > > > umb0: flags=8855<UP,DEBUG,POINTOPOINT,RUNNING,SIMPLEX,MULTICAST> mtu 1500 > > > index 7 priority 6 llprio 3 > > > roaming disabled registration home network > > > state up cell-class LTE rssi -101dBm speed 47.7Mbps up 286Mbps down > > > SIM initialized PIN valid (3 attempts left) > > > subscriber-id 000000000000000 ICC-id 00000000000000000000 provider PLAY > > > device MC7455 IMEI 000000000000000 firmware SWI9X30C_02.33.03.00 > > > APN internet > > > dns 89.108.202.20 185.89.185.1 > > > groups: egress > > > status: active > > > inet 100.125.241.79 --> 100.125.241.80 netmask 0xffffffe0 > > > > > > I do periodic ifconfig(8) and send it to syslog, so I see IP address > > > changed twice during uptime of this machine: > > > > > > # zgrep -F 'inet 100' /var/log/messages... > > > ... > > > 2021-05-17T23:21:01.528Z pce-0035 ifconfig[80523]: inet > > > 100.102.59.145 --> 100.102.59.146 netmask 0xfffffffc > > > 2021-05-17T23:32:47.049Z pce-0035 ifconfig[43011]: inet > > > 100.102.59.145 --> 100.102.59.146 netmask 0xfffffffc > > > 2021-05-17T23:32:47.049Z pce-0035 ifconfig[43011]: inet > > > 100.100.10.26 --> 100.100.10.25 netmask 0xfffffffc > > > 2021-05-18T00:21:01.195Z pce-0035 ifconfig[96823]: inet > > > 100.100.10.26 --> 100.100.10.25 netmask 0xfffffffc > > > ... > > > 2021-05-18T16:21:01.975Z pce-0035 ifconfig[21612]: inet > > > 100.100.10.26 --> 100.100.10.25 netmask 0xfffffffc > > > 2021-05-18T16:38:15.484Z pce-0035 ifconfig[43011]: inet > > > 100.100.10.26 --> 100.100.10.25 netmask 0xfffffffc > > > 2021-05-18T16:38:15.484Z pce-0035 ifconfig[43011]: inet > > > 100.125.241.79 --> 100.125.241.80 netmask 0xffffffe0 > > > 2021-05-18T17:21:01.649Z pce-0035 ifconfig[27235]: inet > > > 100.125.241.79 --> 100.125.241.80 netmask 0xffffffe0 > > > ... > > > > > > > How-To-Repeat: > > > Setup NAT with PF, connect wireguard client over internal > > > network, which goes over external interface which changes IP address > > > once in a while, in my case it's umb(4). > > > > > > > Fix: > > > Unknown. Many workarounds, pfctl -Fs, seems the simplest? > > > > > > After pfctl -Fs, wireguard tunnel works straightaway: > > > > > > rpi-0058:~# ping6 -n -c5 fde4:f456:48c2:13c0::1 > > > PING fde4:f456:48c2:13c0::1(fde4:f456:48c2:13c0::1) 56 data bytes > > > 64 bytes from fde4:f456:48c2:13c0::1: icmp_seq=2 ttl=64 time=54.3 ms > > > 64 bytes from fde4:f456:48c2:13c0::1: icmp_seq=3 ttl=64 time=58.7 ms > > > 64 bytes from fde4:f456:48c2:13c0::1: icmp_seq=4 ttl=64 time=60.9 ms > > > 64 bytes from fde4:f456:48c2:13c0::1: icmp_seq=5 ttl=64 time=70.7 ms > > > > > > --- fde4:f456:48c2:13c0::1 ping statistics --- > > > 5 packets transmitted, 4 received, 20% packet loss, time 15ms > > > rtt min/avg/max/mdev = 54.346/61.151/70.704/5.993 ms > > > > > > > > > Latest handshake obviously drops to a recent value: > > > > > > rpi-0058:~# wg > > > interface: wg0 > > > public key: QLG8RSrYJ/MUmIo2NJcgwleAnPFnl843HwNDgcd9u0c= > > > private key: (hidden) > > > listening port: 51820 > > > > > > peer: klMOiaGJpjMM1bqJouUirOIJRRqcQ8J5QdWOErfj5UM= > > > preshared key: (hidden) > > > endpoint: 5.135.165.132:51820 > > > allowed ips: fde4:f456:48c2:13c0::/64 > > > latest handshake: 1 minute, 20 seconds ago > > > transfer: 429.69 KiB received, 3.12 MiB sent > > > persistent keepalive: every 25 seconds > > -- Regards, Mikolaj
