Thanks Alexandr and Stuart for your replies. I categorized this as a
bug in my head, because of (umb0) vs umb0 usage in my pf rules:


pce-0035# diff -u pf.conf-t1 pf.conf
--- pf.conf-t1   Wed May 19 07:45:27 2021
+++ pf.conf        Thu Mar  4 19:34:25 2021
@@ -6,7 +6,7 @@
 queue q_umb0 on umb0 flows 1024 qlimit 50 quantum 300 default
 queue q_athn0 on athn0 flows 1024 qlimit 100 quantum 300 default
 
-match out on umb0 inet from !umb0 nat-to umb0:0
+match out on umb0 inet from !(umb0) nat-to (umb0:0)
 match on umb0 inet all scrub (no-df random-id max-mss 1460)
 
 block return


Then via pfctl -vnf pf.conf...

pce-0035# diff -u pfctl-vnf-t1 pfctl-vnf
--- pfctl-vv-sn-f-t1     Wed May 19 07:45:54 2021
+++ pfctl-vv-sn-f     Wed May 19 07:46:03 2021
@@ -2,7 +2,7 @@
 set limit states 25000
 queue q_umb0 on umb0 flows 1024 quantum 300 default qlimit 50
 queue q_athn0 on athn0 flows 1024 quantum 300 default qlimit 100
-match out on umb0 inet from ! 100.125.241.79 to any nat-to
100.125.241.79
+match out on umb0 inet from ! (umb0) to any nat-to (umb0:0)
 match on umb0 inet all scrub (no-df random-id max-mss 1460)
 block return all
 pass quick proto icmp all


On Wed, May 19, 2021 at 07:57:03AM +0100, Stuart Henderson wrote:
> This can happen with any long lived UDP-based protocol that is natted to a
> dynamically configured address. I think this is not a bug; PF is doing
> exactly what you have asked it to.
> 
> The issue is that you're sending packets frequently (due to the fairly low
> keepalive timer) which refreshes the PF state and keeping the NAT mapping
> alive. Normally this is what you want, it's the whole point of the
> keepalive, but it falls apart when the address changes.
> 
> As you've said, flushing all states does the trick (and it's obvious why);
> this could possibly be automated via ifstated. It's a bit heavy handed to
> flush all states when only those involving one natted IP address actually
> need it but pf doesn't have a more targetted way for states using a certain
> NAT address, short of parsing pfctl -ss and killing the individual states by
> id. Alternatively, not a generic solution but might work in your case is to
> use ifstated to run pfctl -k to delete states from the known address of the
> rpi to the known wireguard endpoint address. I'm hoping that the change of
> address would also involve a link state change so that ifstated can trigger
> on this. However the interface used by pfctl -k is buggy and I can't
> remember if this use will run into a problem or not...
> 
> I don't know if it might make sense to handle automatically in the kernel;
> it would be convenient for the user but would I think be delicate work,
> especially regarding MP locking.
> 
> -- 
>  Sent from a phone, apologies for poor formatting.
> On 19 May 2021 01:16:06 Mikolaj Kucharski <[email protected]> wrote:
> 
> > Forgot to also show PF rules:
> > 
> > pce-0035# grep -ve '^$' -e '^#' /etc/pf.conf
> > set skip on lo
> > set limit states 25000
> > queue q_umb0 on umb0 flows 1024 qlimit 50 quantum 300 default
> > queue q_athn0 on athn0 flows 1024 qlimit 100 quantum 300 default
> > match out on umb0 inet from !(umb0) nat-to (umb0:0)
> > match on umb0 inet all scrub (no-df random-id max-mss 1460)
> > block return
> > pass quick proto icmp all
> > pass quick proto icmp6 all
> > pass quick on tun0 from (tun0:network) to any keep state (if-bound)
> > pass in quick proto tcp from any to (self) port ssh
> > pass in quick proto udp from any to (self) port 51820
> > pass quick on { athn0 em0 em1 em2 }
> > pass out
> > 
> > 
> > On Wed, May 19, 2021 at 12:10:45AM +0000, [email protected] wrote:
> > > > Synopsis: wireguard traffic blackholed after umb(4) changes ip addr
> > > > Category: kernel
> > > > Environment:
> > > System      : OpenBSD 6.9
> > > Details     : OpenBSD 6.9-current (GENERIC.MP) #14: Tue May 11
> > > 18:41:12 UTC 2021
> > > [email protected]:/home/mkucharski/openbsd/src/sys/arch/amd64/compile/GENERIC.MP
> > > 
> > > Architecture: OpenBSD.amd64
> > > Machine     : amd64
> > > > Description:
> > > This also occurs on vanilla kernel, but at present I'm running
> > > custom kernel, with some athn(4) related changes.
> > > 
> > > Once in a while umb(4) disconnects from network and reconnects with new
> > > IP address. OpenBSD machine, from this bug report has a wg0 interface
> > > and when umb(4) changes IP address wireguard tunnel from OpenBSD to
> > > external peer klMOiaGJpjMM1bqJouUirOIJRRqcQ8J5QdWOErfj5UM= is NOT
> > > affected:
> > > 
> > > pce-0035# ifconfig wg0
> > > wg0: flags=80c3<UP,BROADCAST,RUNNING,NOARP,MULTICAST> mtu 1420
> > > index 9 priority 0 llprio 3
> > > wgport 51820
> > > wgpubkey BvWfmzqI94CkkI5TygWcmT10de8+7DUA2cxsl3jPeyo=
> > > wgpeer klMOiaGJpjMM1bqJouUirOIJRRqcQ8J5QdWOErfj5UM=
> > >        wgpsk (present)
> > >        wgpka 25 (sec)
> > >        wgendpoint 5.135.165.132 51820
> > >        tx: 19396164, rx: 2960456
> > >        last handshake: 57 seconds ago
> > >        wgaip fde4:f456:48c2:13c0::/64
> > > groups: wg
> > > inet6 fde4:f456:48c2:13c0::cc35 prefixlen 64
> > > 
> > > Per above output last handshake is pretty recent. However, over em(1)
> > > device there is RPi connected with Linux, which also has wireguard
> > > tunnel configured to the same endpoint:
> > > 
> > > rpi-0058:~# wg
> > > interface: wg0
> > > public key: QLG8RSrYJ/MUmIo2NJcgwleAnPFnl843HwNDgcd9u0c=
> > > private key: (hidden)
> > > listening port: 51820
> > > 
> > > peer: klMOiaGJpjMM1bqJouUirOIJRRqcQ8J5QdWOErfj5UM=
> > > preshared key: (hidden)
> > > endpoint: 5.135.165.132:51820
> > > allowed ips: fde4:f456:48c2:13c0::/64
> > > latest handshake: 1 day, 29 minutes, 14 seconds ago
> > > transfer: 424.79 KiB received, 3.09 MiB sent
> > > persistent keepalive: every 25 seconds
> > > 
> > > Per above output, we see that latest handshake was more than a day ago.
> > > When I look tcpdump on em(4) which is connected to RPi I see following:
> > > 
> > > pce-0035# tcpdump -c5 -ni em1 host 5.135.165.132 and port 51820
> > > tcpdump: listening on em1, link-type EN10MB
> > > 23:22:03.803721 192.168.1.58.51820 > 5.135.165.132.51820: [wg]
> > > initiation from 0xec39d550 [tos 0x88]
> > > 23:22:08.923739 192.168.1.58.51820 > 5.135.165.132.51820: [wg]
> > > initiation from 0x13821437 [tos 0x88]
> > > 23:22:14.043830 192.168.1.58.51820 > 5.135.165.132.51820: [wg]
> > > initiation from 0x3fda1931 [tos 0x88]
> > > 23:22:19.803752 192.168.1.58.51820 > 5.135.165.132.51820: [wg]
> > > initiation from 0x1537a4da [tos 0x88]
> > > 23:22:25.083788 192.168.1.58.51820 > 5.135.165.132.51820: [wg]
> > > initiation from 0x93343c5f [tos 0x88]
> > > 
> > > Then when I look at tcpdmump on umb(4) I see traffic from local machine,
> > > which works fine, but also we see above initiation traffic from RPi,
> > > which uses wrong (old) IP address (TWO wrong IP addresses):
> > > 
> > > pce-0035# tcpdump -c50 -ni umb0 host 5.135.165.132 and port 51820
> > > tcpdump: listening on umb0, link-type LOOP
> > > 23:23:30.355499 100.125.241.79.51820 > 5.135.165.132.51820: [wg]
> > > data length 160 to 0x12b4877c nonce 6
> > > 23:23:30.355695 100.125.241.79.51820 > 5.135.165.132.51820: [wg]
> > > data length 160 to 0x12b4877c nonce 7
> > > 23:23:30.421959 5.135.165.132.51820 > 100.125.241.79.51820: [wg]
> > > data length 80 to 0xb8f89c82 nonce 7
> > > 23:23:30.614535 5.135.165.132.51820 > 100.125.241.79.51820: [wg]
> > > data length 80 to 0xb8f89c82 nonce 8
> > > 23:23:31.360393 100.125.241.79.51820 > 5.135.165.132.51820: [wg]
> > > data length 480 to 0x12b4877c nonce 8
> > > 23:23:31.360398 100.125.241.79.51820 > 5.135.165.132.51820: [wg]
> > > data length 256 to 0x12b4877c nonce 9
> > > 23:23:31.431873 5.135.165.132.51820 > 100.125.241.79.51820: [wg]
> > > data length 80 to 0xb8f89c82 nonce 9
> > > 23:23:32.369725 100.125.241.79.51820 > 5.135.165.132.51820: [wg]
> > > data length 480 to 0x12b4877c nonce 10
> > > 23:23:32.683202 100.102.59.145.52422 > 5.135.165.132.51820: [wg]
> > > initiation from 0x3007810a [tos 0x88]
> > > 23:23:32.744484 5.135.165.132.51820 > 100.125.241.79.51820: [wg]
> > > data length 80 to 0xb8f89c82 nonce 10
> > > 23:23:33.379929 100.125.241.79.51820 > 5.135.165.132.51820: [wg]
> > > data length 256 to 0x12b4877c nonce 11
> > > 23:23:33.380107 100.125.241.79.51820 > 5.135.165.132.51820: [wg]
> > > data length 368 to 0x12b4877c nonce 12
> > > 23:23:33.518203 5.135.165.132.51820 > 100.125.241.79.51820: [wg]
> > > data length 80 to 0xb8f89c82 nonce 11
> > > 23:23:34.389827 100.125.241.79.51820 > 5.135.165.132.51820: [wg]
> > > data length 480 to 0x12b4877c nonce 13
> > > 23:23:34.753559 5.135.165.132.51820 > 100.125.241.79.51820: [wg]
> > > data length 80 to 0xb8f89c82 nonce 12
> > > 23:23:35.399748 100.125.241.79.51820 > 5.135.165.132.51820: [wg]
> > > data length 368 to 0x12b4877c nonce 14
> > > 23:23:35.774192 5.135.165.132.51820 > 100.125.241.79.51820: [wg]
> > > data length 80 to 0xb8f89c82 nonce 13
> > > 23:23:35.903086 100.100.10.26.55426 > 5.135.165.132.51820: [wg]
> > > initiation from 0xa30f0817 [tos 0x88]
> > > 23:23:36.409833 100.125.241.79.51820 > 5.135.165.132.51820: [wg]
> > > data length 368 to 0x12b4877c nonce 15
> > > 23:23:36.409838 100.125.241.79.51820 > 5.135.165.132.51820: [wg]
> > > data length 256 to 0x12b4877c nonce 16
> > > 23:23:36.550602 5.135.165.132.51820 > 100.125.241.79.51820: [wg]
> > > data length 80 to 0xb8f89c82 nonce 14
> > > 23:23:37.419936 100.125.241.79.51820 > 5.135.165.132.51820: [wg]
> > > data length 480 to 0x12b4877c nonce 17
> > > 23:23:37.712890 5.135.165.132.51820 > 100.125.241.79.51820: [wg]
> > > data length 80 to 0xb8f89c82 nonce 15
> > > 23:23:37.882966 100.102.59.145.52422 > 5.135.165.132.51820: [wg]
> > > initiation from 0x293df129 [tos 0x88]
> > > 23:23:38.429823 100.125.241.79.51820 > 5.135.165.132.51820: [wg]
> > > data length 480 to 0x12b4877c nonce 18
> > > 23:23:38.834033 5.135.165.132.51820 > 100.125.241.79.51820: [wg]
> > > data length 80 to 0xb8f89c82 nonce 16
> > > 23:23:39.439863 100.125.241.79.51820 > 5.135.165.132.51820: [wg]
> > > data length 368 to 0x12b4877c nonce 19
> > > 23:23:39.801909 5.135.165.132.51820 > 100.125.241.79.51820: [wg]
> > > data length 80 to 0xb8f89c82 nonce 17
> > > 23:23:40.449874 100.125.241.79.51820 > 5.135.165.132.51820: [wg]
> > > data length 368 to 0x12b4877c nonce 20
> > > 23:23:40.833360 5.135.165.132.51820 > 100.125.241.79.51820: [wg]
> > > data length 80 to 0xb8f89c82 nonce 18
> > > 23:23:41.279245 100.100.10.26.55426 > 5.135.165.132.51820: [wg]
> > > initiation from 0xe9cf88d8 [tos 0x88]
> > > 23:23:41.459919 100.125.241.79.51820 > 5.135.165.132.51820: [wg]
> > > data length 480 to 0x12b4877c nonce 21
> > > 23:23:41.794501 5.135.165.132.51820 > 100.125.241.79.51820: [wg]
> > > data length 80 to 0xb8f89c82 nonce 19
> > > 23:23:42.469957 100.125.241.79.51820 > 5.135.165.132.51820: [wg]
> > > data length 368 to 0x12b4877c nonce 22
> > > 23:23:42.833630 5.135.165.132.51820 > 100.125.241.79.51820: [wg]
> > > data length 80 to 0xb8f89c82 nonce 20
> > > 23:23:43.082901 100.102.59.145.52422 > 5.135.165.132.51820: [wg]
> > > initiation from 0x41dfd760 [tos 0x88]
> > > 23:23:43.479931 100.125.241.79.51820 > 5.135.165.132.51820: [wg]
> > > data length 480 to 0x12b4877c nonce 23
> > > 23:23:43.873761 5.135.165.132.51820 > 100.125.241.79.51820: [wg]
> > > data length 80 to 0xb8f89c82 nonce 21
> > > 23:23:44.489891 100.125.241.79.51820 > 5.135.165.132.51820: [wg]
> > > data length 368 to 0x12b4877c nonce 24
> > > 23:23:44.833901 5.135.165.132.51820 > 100.125.241.79.51820: [wg]
> > > data length 80 to 0xb8f89c82 nonce 22
> > > 23:23:45.500016 100.125.241.79.51820 > 5.135.165.132.51820: [wg]
> > > data length 368 to 0x12b4877c nonce 25
> > > 23:23:45.874031 5.135.165.132.51820 > 100.125.241.79.51820: [wg]
> > > data length 80 to 0xb8f89c82 nonce 23
> > > 23:23:46.398752 100.100.10.26.55426 > 5.135.165.132.51820: [wg]
> > > initiation from 0x2aaeec29 [tos 0x88]
> > > 23:23:46.510138 100.125.241.79.51820 > 5.135.165.132.51820: [wg]
> > > data length 480 to 0x12b4877c nonce 26
> > > 23:23:46.782331 5.135.165.132.51820 > 100.125.241.79.51820: [wg]
> > > data length 80 to 0xb8f89c82 nonce 24
> > > 23:23:47.520027 100.125.241.79.51820 > 5.135.165.132.51820: [wg]
> > > data length 368 to 0x12b4877c nonce 27
> > > 23:23:47.794006 5.135.165.132.51820 > 100.125.241.79.51820: [wg]
> > > data length 80 to 0xb8f89c82 nonce 25
> > > 23:23:48.530003 100.125.241.79.51820 > 5.135.165.132.51820: [wg]
> > > data length 368 to 0x12b4877c nonce 28
> > > 23:23:48.762716 100.102.59.145.52422 > 5.135.165.132.51820: [wg]
> > > initiation from 0xa0dadeeb [tos 0x88]
> > > 23:23:48.904684 5.135.165.132.51820 > 100.125.241.79.51820: [wg]
> > > data length 80 to 0xb8f89c82 nonce 26
> > > 
> > > Above grep for `initiation from`.
> > > 
> > > See IP addr 100.125.241.79 (working) versus 100.100.10.26 (not working),
> > > 100.102.59.145 (another not working) and states in PF:
> > > 
> > > pce-0035# pfctl -ss | grep -F 5.135.165.132:51820
> > > all udp 5.135.165.132:51820 <- 192.168.1.58:51820       MULTIPLE:MULTIPLE
> > > all udp 100.102.59.145:52422 (192.168.1.58:51820) ->
> > > 5.135.165.132:51820       MULTIPLE:MULTIPLE
> > > all udp 5.135.165.132:51820 <- 192.168.5.189:51820       MULTIPLE:MULTIPLE
> > > all udp 100.100.10.26:55426 (192.168.5.189:51820) ->
> > > 5.135.165.132:51820       MULTIPLE:MULTIPLE
> > > all udp 100.125.241.79:51820 -> 5.135.165.132:51820       
> > > MULTIPLE:MULTIPLE
> > > 
> > > Current IP address on umb(4) is 100.125.241.79:
> > > 
> > > pce-0035# ifconfig umb0
> > > umb0: flags=8855<UP,DEBUG,POINTOPOINT,RUNNING,SIMPLEX,MULTICAST> mtu 1500
> > > index 7 priority 6 llprio 3
> > > roaming disabled registration home network
> > > state up cell-class LTE rssi -101dBm speed 47.7Mbps up 286Mbps down
> > > SIM initialized PIN valid (3 attempts left)
> > > subscriber-id 000000000000000 ICC-id 00000000000000000000 provider PLAY
> > > device MC7455 IMEI 000000000000000 firmware SWI9X30C_02.33.03.00
> > > APN internet
> > > dns 89.108.202.20 185.89.185.1
> > > groups: egress
> > > status: active
> > > inet 100.125.241.79 --> 100.125.241.80 netmask 0xffffffe0
> > > 
> > > I do periodic ifconfig(8) and send it to syslog, so I see IP address
> > > changed twice during uptime of this machine:
> > > 
> > > # zgrep -F 'inet 100' /var/log/messages...
> > > ...
> > > 2021-05-17T23:21:01.528Z pce-0035 ifconfig[80523]:      inet
> > > 100.102.59.145 --> 100.102.59.146 netmask 0xfffffffc
> > > 2021-05-17T23:32:47.049Z pce-0035 ifconfig[43011]:      inet
> > > 100.102.59.145 --> 100.102.59.146 netmask 0xfffffffc
> > > 2021-05-17T23:32:47.049Z pce-0035 ifconfig[43011]:      inet
> > > 100.100.10.26 --> 100.100.10.25 netmask 0xfffffffc
> > > 2021-05-18T00:21:01.195Z pce-0035 ifconfig[96823]:      inet
> > > 100.100.10.26 --> 100.100.10.25 netmask 0xfffffffc
> > > ...
> > > 2021-05-18T16:21:01.975Z pce-0035 ifconfig[21612]:      inet
> > > 100.100.10.26 --> 100.100.10.25 netmask 0xfffffffc
> > > 2021-05-18T16:38:15.484Z pce-0035 ifconfig[43011]:      inet
> > > 100.100.10.26 --> 100.100.10.25 netmask 0xfffffffc
> > > 2021-05-18T16:38:15.484Z pce-0035 ifconfig[43011]:      inet
> > > 100.125.241.79 --> 100.125.241.80 netmask 0xffffffe0
> > > 2021-05-18T17:21:01.649Z pce-0035 ifconfig[27235]:      inet
> > > 100.125.241.79 --> 100.125.241.80 netmask 0xffffffe0
> > > ...
> > > 
> > > > How-To-Repeat:
> > > Setup NAT with PF, connect wireguard client over internal
> > > network, which goes over external interface which changes IP address
> > > once in a while, in my case it's umb(4).
> > > 
> > > > Fix:
> > > Unknown. Many workarounds, pfctl -Fs, seems the simplest?
> > > 
> > > After pfctl -Fs, wireguard tunnel works straightaway:
> > > 
> > > rpi-0058:~# ping6 -n -c5 fde4:f456:48c2:13c0::1
> > > PING fde4:f456:48c2:13c0::1(fde4:f456:48c2:13c0::1) 56 data bytes
> > > 64 bytes from fde4:f456:48c2:13c0::1: icmp_seq=2 ttl=64 time=54.3 ms
> > > 64 bytes from fde4:f456:48c2:13c0::1: icmp_seq=3 ttl=64 time=58.7 ms
> > > 64 bytes from fde4:f456:48c2:13c0::1: icmp_seq=4 ttl=64 time=60.9 ms
> > > 64 bytes from fde4:f456:48c2:13c0::1: icmp_seq=5 ttl=64 time=70.7 ms
> > > 
> > > --- fde4:f456:48c2:13c0::1 ping statistics ---
> > > 5 packets transmitted, 4 received, 20% packet loss, time 15ms
> > > rtt min/avg/max/mdev = 54.346/61.151/70.704/5.993 ms
> > > 
> > > 
> > > Latest handshake obviously drops to a recent value:
> > > 
> > > rpi-0058:~# wg
> > > interface: wg0
> > > public key: QLG8RSrYJ/MUmIo2NJcgwleAnPFnl843HwNDgcd9u0c=
> > > private key: (hidden)
> > > listening port: 51820
> > > 
> > > peer: klMOiaGJpjMM1bqJouUirOIJRRqcQ8J5QdWOErfj5UM=
> > > preshared key: (hidden)
> > > endpoint: 5.135.165.132:51820
> > > allowed ips: fde4:f456:48c2:13c0::/64
> > > latest handshake: 1 minute, 20 seconds ago
> > > transfer: 429.69 KiB received, 3.12 MiB sent
> > > persistent keepalive: every 25 seconds
> > 

-- 
Regards,
 Mikolaj

Reply via email to