Synopsis: wireguard traffic blackholed after umb(4) changes ip addr
Category: kernel
Environment:
System : OpenBSD 6.9
Details : OpenBSD 6.9-current (GENERIC.MP) #14: Tue May 11 18:41:12 UTC
2021
r...@pc1.home.local:/home/mkucharski/openbsd/src/sys/arch/amd64/compile/GENERIC.MP
Architecture: OpenBSD.amd64
Machine : amd64
Description:
This also occurs on vanilla kernel, but at present I'm running
custom kernel, with some athn(4) related changes.
Once in a while umb(4) disconnects from network and reconnects with new
IP address. OpenBSD machine, from this bug report has a wg0 interface
and when umb(4) changes IP address wireguard tunnel from OpenBSD to
external peer klMOiaGJpjMM1bqJouUirOIJRRqcQ8J5QdWOErfj5UM= is NOT
affected:
pce-0035# ifconfig wg0
wg0: flags=80c3<UP,BROADCAST,RUNNING,NOARP,MULTICAST> mtu 1420
index 9 priority 0 llprio 3
wgport 51820
wgpubkey BvWfmzqI94CkkI5TygWcmT10de8+7DUA2cxsl3jPeyo=
wgpeer klMOiaGJpjMM1bqJouUirOIJRRqcQ8J5QdWOErfj5UM=
wgpsk (present)
wgpka 25 (sec)
wgendpoint 5.135.165.132 51820
tx: 19396164, rx: 2960456
last handshake: 57 seconds ago
wgaip fde4:f456:48c2:13c0::/64
groups: wg
inet6 fde4:f456:48c2:13c0::cc35 prefixlen 64
Per above output last handshake is pretty recent. However, over em(1)
device there is RPi connected with Linux, which also has wireguard
tunnel configured to the same endpoint:
rpi-0058:~# wg
interface: wg0
public key: QLG8RSrYJ/MUmIo2NJcgwleAnPFnl843HwNDgcd9u0c=
private key: (hidden)
listening port: 51820
peer: klMOiaGJpjMM1bqJouUirOIJRRqcQ8J5QdWOErfj5UM=
preshared key: (hidden)
endpoint: 5.135.165.132:51820
allowed ips: fde4:f456:48c2:13c0::/64
latest handshake: 1 day, 29 minutes, 14 seconds ago
transfer: 424.79 KiB received, 3.09 MiB sent
persistent keepalive: every 25 seconds
Per above output, we see that latest handshake was more than a day ago.
When I look tcpdump on em(4) which is connected to RPi I see following:
pce-0035# tcpdump -c5 -ni em1 host 5.135.165.132 and port 51820
tcpdump: listening on em1, link-type EN10MB
23:22:03.803721 192.168.1.58.51820 > 5.135.165.132.51820: [wg] initiation
from 0xec39d550 [tos 0x88]
23:22:08.923739 192.168.1.58.51820 > 5.135.165.132.51820: [wg] initiation
from 0x13821437 [tos 0x88]
23:22:14.043830 192.168.1.58.51820 > 5.135.165.132.51820: [wg] initiation
from 0x3fda1931 [tos 0x88]
23:22:19.803752 192.168.1.58.51820 > 5.135.165.132.51820: [wg] initiation
from 0x1537a4da [tos 0x88]
23:22:25.083788 192.168.1.58.51820 > 5.135.165.132.51820: [wg] initiation
from 0x93343c5f [tos 0x88]
Then when I look at tcpdmump on umb(4) I see traffic from local machine,
which works fine, but also we see above initiation traffic from RPi,
which uses wrong (old) IP address (TWO wrong IP addresses):
pce-0035# tcpdump -c50 -ni umb0 host 5.135.165.132 and port 51820
tcpdump: listening on umb0, link-type LOOP
23:23:30.355499 100.125.241.79.51820 > 5.135.165.132.51820: [wg] data
length 160 to 0x12b4877c nonce 6
23:23:30.355695 100.125.241.79.51820 > 5.135.165.132.51820: [wg] data
length 160 to 0x12b4877c nonce 7
23:23:30.421959 5.135.165.132.51820 > 100.125.241.79.51820: [wg] data
length 80 to 0xb8f89c82 nonce 7
23:23:30.614535 5.135.165.132.51820 > 100.125.241.79.51820: [wg] data
length 80 to 0xb8f89c82 nonce 8
23:23:31.360393 100.125.241.79.51820 > 5.135.165.132.51820: [wg] data
length 480 to 0x12b4877c nonce 8
23:23:31.360398 100.125.241.79.51820 > 5.135.165.132.51820: [wg] data
length 256 to 0x12b4877c nonce 9
23:23:31.431873 5.135.165.132.51820 > 100.125.241.79.51820: [wg] data
length 80 to 0xb8f89c82 nonce 9
23:23:32.369725 100.125.241.79.51820 > 5.135.165.132.51820: [wg] data
length 480 to 0x12b4877c nonce 10
23:23:32.683202 100.102.59.145.52422 > 5.135.165.132.51820: [wg] initiation
from 0x3007810a [tos 0x88]
23:23:32.744484 5.135.165.132.51820 > 100.125.241.79.51820: [wg] data
length 80 to 0xb8f89c82 nonce 10
23:23:33.379929 100.125.241.79.51820 > 5.135.165.132.51820: [wg] data
length 256 to 0x12b4877c nonce 11
23:23:33.380107 100.125.241.79.51820 > 5.135.165.132.51820: [wg] data
length 368 to 0x12b4877c nonce 12
23:23:33.518203 5.135.165.132.51820 > 100.125.241.79.51820: [wg] data
length 80 to 0xb8f89c82 nonce 11
23:23:34.389827 100.125.241.79.51820 > 5.135.165.132.51820: [wg] data
length 480 to 0x12b4877c nonce 13
23:23:34.753559 5.135.165.132.51820 > 100.125.241.79.51820: [wg] data
length 80 to 0xb8f89c82 nonce 12
23:23:35.399748 100.125.241.79.51820 > 5.135.165.132.51820: [wg] data
length 368 to 0x12b4877c nonce 14
23:23:35.774192 5.135.165.132.51820 > 100.125.241.79.51820: [wg] data
length 80 to 0xb8f89c82 nonce 13
23:23:35.903086 100.100.10.26.55426 > 5.135.165.132.51820: [wg] initiation
from 0xa30f0817 [tos 0x88]
23:23:36.409833 100.125.241.79.51820 > 5.135.165.132.51820: [wg] data
length 368 to 0x12b4877c nonce 15
23:23:36.409838 100.125.241.79.51820 > 5.135.165.132.51820: [wg] data
length 256 to 0x12b4877c nonce 16
23:23:36.550602 5.135.165.132.51820 > 100.125.241.79.51820: [wg] data
length 80 to 0xb8f89c82 nonce 14
23:23:37.419936 100.125.241.79.51820 > 5.135.165.132.51820: [wg] data
length 480 to 0x12b4877c nonce 17
23:23:37.712890 5.135.165.132.51820 > 100.125.241.79.51820: [wg] data
length 80 to 0xb8f89c82 nonce 15
23:23:37.882966 100.102.59.145.52422 > 5.135.165.132.51820: [wg] initiation
from 0x293df129 [tos 0x88]
23:23:38.429823 100.125.241.79.51820 > 5.135.165.132.51820: [wg] data
length 480 to 0x12b4877c nonce 18
23:23:38.834033 5.135.165.132.51820 > 100.125.241.79.51820: [wg] data
length 80 to 0xb8f89c82 nonce 16
23:23:39.439863 100.125.241.79.51820 > 5.135.165.132.51820: [wg] data
length 368 to 0x12b4877c nonce 19
23:23:39.801909 5.135.165.132.51820 > 100.125.241.79.51820: [wg] data
length 80 to 0xb8f89c82 nonce 17
23:23:40.449874 100.125.241.79.51820 > 5.135.165.132.51820: [wg] data
length 368 to 0x12b4877c nonce 20
23:23:40.833360 5.135.165.132.51820 > 100.125.241.79.51820: [wg] data
length 80 to 0xb8f89c82 nonce 18
23:23:41.279245 100.100.10.26.55426 > 5.135.165.132.51820: [wg] initiation
from 0xe9cf88d8 [tos 0x88]
23:23:41.459919 100.125.241.79.51820 > 5.135.165.132.51820: [wg] data
length 480 to 0x12b4877c nonce 21
23:23:41.794501 5.135.165.132.51820 > 100.125.241.79.51820: [wg] data
length 80 to 0xb8f89c82 nonce 19
23:23:42.469957 100.125.241.79.51820 > 5.135.165.132.51820: [wg] data
length 368 to 0x12b4877c nonce 22
23:23:42.833630 5.135.165.132.51820 > 100.125.241.79.51820: [wg] data
length 80 to 0xb8f89c82 nonce 20
23:23:43.082901 100.102.59.145.52422 > 5.135.165.132.51820: [wg] initiation
from 0x41dfd760 [tos 0x88]
23:23:43.479931 100.125.241.79.51820 > 5.135.165.132.51820: [wg] data
length 480 to 0x12b4877c nonce 23
23:23:43.873761 5.135.165.132.51820 > 100.125.241.79.51820: [wg] data
length 80 to 0xb8f89c82 nonce 21
23:23:44.489891 100.125.241.79.51820 > 5.135.165.132.51820: [wg] data
length 368 to 0x12b4877c nonce 24
23:23:44.833901 5.135.165.132.51820 > 100.125.241.79.51820: [wg] data
length 80 to 0xb8f89c82 nonce 22
23:23:45.500016 100.125.241.79.51820 > 5.135.165.132.51820: [wg] data
length 368 to 0x12b4877c nonce 25
23:23:45.874031 5.135.165.132.51820 > 100.125.241.79.51820: [wg] data
length 80 to 0xb8f89c82 nonce 23
23:23:46.398752 100.100.10.26.55426 > 5.135.165.132.51820: [wg] initiation
from 0x2aaeec29 [tos 0x88]
23:23:46.510138 100.125.241.79.51820 > 5.135.165.132.51820: [wg] data
length 480 to 0x12b4877c nonce 26
23:23:46.782331 5.135.165.132.51820 > 100.125.241.79.51820: [wg] data
length 80 to 0xb8f89c82 nonce 24
23:23:47.520027 100.125.241.79.51820 > 5.135.165.132.51820: [wg] data
length 368 to 0x12b4877c nonce 27
23:23:47.794006 5.135.165.132.51820 > 100.125.241.79.51820: [wg] data
length 80 to 0xb8f89c82 nonce 25
23:23:48.530003 100.125.241.79.51820 > 5.135.165.132.51820: [wg] data
length 368 to 0x12b4877c nonce 28
23:23:48.762716 100.102.59.145.52422 > 5.135.165.132.51820: [wg] initiation
from 0xa0dadeeb [tos 0x88]
23:23:48.904684 5.135.165.132.51820 > 100.125.241.79.51820: [wg] data
length 80 to 0xb8f89c82 nonce 26
Above grep for `initiation from`.
See IP addr 100.125.241.79 (working) versus 100.100.10.26 (not working),
100.102.59.145 (another not working) and states in PF:
pce-0035# pfctl -ss | grep -F 5.135.165.132:51820
all udp 5.135.165.132:51820 <- 192.168.1.58:51820 MULTIPLE:MULTIPLE
all udp 100.102.59.145:52422 (192.168.1.58:51820) -> 5.135.165.132:51820
MULTIPLE:MULTIPLE
all udp 5.135.165.132:51820 <- 192.168.5.189:51820 MULTIPLE:MULTIPLE
all udp 100.100.10.26:55426 (192.168.5.189:51820) -> 5.135.165.132:51820
MULTIPLE:MULTIPLE
all udp 100.125.241.79:51820 -> 5.135.165.132:51820 MULTIPLE:MULTIPLE
Current IP address on umb(4) is 100.125.241.79:
pce-0035# ifconfig umb0
umb0: flags=8855<UP,DEBUG,POINTOPOINT,RUNNING,SIMPLEX,MULTICAST> mtu 1500
index 7 priority 6 llprio 3
roaming disabled registration home network
state up cell-class LTE rssi -101dBm speed 47.7Mbps up 286Mbps down
SIM initialized PIN valid (3 attempts left)
subscriber-id 000000000000000 ICC-id 00000000000000000000 provider PLAY
device MC7455 IMEI 000000000000000 firmware SWI9X30C_02.33.03.00
APN internet
dns 89.108.202.20 185.89.185.1
groups: egress
status: active
inet 100.125.241.79 --> 100.125.241.80 netmask 0xffffffe0
I do periodic ifconfig(8) and send it to syslog, so I see IP address
changed twice during uptime of this machine:
# zgrep -F 'inet 100' /var/log/messages...
...
2021-05-17T23:21:01.528Z pce-0035 ifconfig[80523]: inet 100.102.59.145
--> 100.102.59.146 netmask 0xfffffffc
2021-05-17T23:32:47.049Z pce-0035 ifconfig[43011]: inet 100.102.59.145
--> 100.102.59.146 netmask 0xfffffffc
2021-05-17T23:32:47.049Z pce-0035 ifconfig[43011]: inet 100.100.10.26
--> 100.100.10.25 netmask 0xfffffffc
2021-05-18T00:21:01.195Z pce-0035 ifconfig[96823]: inet 100.100.10.26
--> 100.100.10.25 netmask 0xfffffffc
...
2021-05-18T16:21:01.975Z pce-0035 ifconfig[21612]: inet 100.100.10.26
--> 100.100.10.25 netmask 0xfffffffc
2021-05-18T16:38:15.484Z pce-0035 ifconfig[43011]: inet 100.100.10.26
--> 100.100.10.25 netmask 0xfffffffc
2021-05-18T16:38:15.484Z pce-0035 ifconfig[43011]: inet 100.125.241.79
--> 100.125.241.80 netmask 0xffffffe0
2021-05-18T17:21:01.649Z pce-0035 ifconfig[27235]: inet 100.125.241.79
--> 100.125.241.80 netmask 0xffffffe0
...
How-To-Repeat:
Setup NAT with PF, connect wireguard client over internal
network, which goes over external interface which changes IP address
once in a while, in my case it's umb(4).
Fix:
Unknown. Many workarounds, pfctl -Fs, seems the simplest?
After pfctl -Fs, wireguard tunnel works straightaway:
rpi-0058:~# ping6 -n -c5 fde4:f456:48c2:13c0::1
PING fde4:f456:48c2:13c0::1(fde4:f456:48c2:13c0::1) 56 data bytes
64 bytes from fde4:f456:48c2:13c0::1: icmp_seq=2 ttl=64 time=54.3 ms
64 bytes from fde4:f456:48c2:13c0::1: icmp_seq=3 ttl=64 time=58.7 ms
64 bytes from fde4:f456:48c2:13c0::1: icmp_seq=4 ttl=64 time=60.9 ms
64 bytes from fde4:f456:48c2:13c0::1: icmp_seq=5 ttl=64 time=70.7 ms
--- fde4:f456:48c2:13c0::1 ping statistics ---
5 packets transmitted, 4 received, 20% packet loss, time 15ms
rtt min/avg/max/mdev = 54.346/61.151/70.704/5.993 ms
Latest handshake obviously drops to a recent value:
rpi-0058:~# wg
interface: wg0
public key: QLG8RSrYJ/MUmIo2NJcgwleAnPFnl843HwNDgcd9u0c=
private key: (hidden)
listening port: 51820
peer: klMOiaGJpjMM1bqJouUirOIJRRqcQ8J5QdWOErfj5UM=
preshared key: (hidden)
endpoint: 5.135.165.132:51820
allowed ips: fde4:f456:48c2:13c0::/64
latest handshake: 1 minute, 20 seconds ago
transfer: 429.69 KiB received, 3.12 MiB sent
persistent keepalive: every 25 seconds