Hello, thank you for detailed bug report. It clearly shows there is something odd.
On Mon, May 08, 2023 at 01:30:24AM +0300, Joosep wrote: </snip> > When looking through the pf debug-level logs, 2 events stand up - first one > has no problems, second one fails.Note that source port 53081 for nat exist > in > 2 different events (this particular test run was 66 dns requests long > on an otherwise idle system): > May 8 03:35:53 lab73 /bsd: pf: key search, in on vlan5: UDP wire: (0) > 10.10.11.3:25217 10.10.10.1:53 > May 8 03:35:53 lab73 /bsd: pf: key setup: UDP wire: (0) 10.10.11.3:25217 > 10.10.10.1:53 stack: (0) - > May 8 03:35:53 lab73 /bsd: pf: key search, out on em2: UDP wire: (0) > 10.10.10.1:53 10.10.11.3:25217 > May 8 03:35:53 lab73 /bsd: pf: pf_map_addr: selected address 10.10.10.2 > May 8 03:35:53 lab73 /bsd: pf: key setup: UDP wire: (0) 10.10.10.1:53 > 10.10.10.2:53081 stack: (0) 10.10.10.1:53 10.10.11.3:25217 > May 8 03:35:53 lab73 /bsd: pf: key search, in on em2: UDP wire: (0) > 10.10.10.1:53 10.10.10.2:53081 > May 8 03:35:53 lab73 /bsd: pf: key search, out on vlan5: UDP wire: (0) > 10.10.11.3:25217 10.10.10.1:53 > ..... > May 8 03:36:02 lab73 /bsd: pf: key search, in on vlan5: UDP wire: (0) > 10.10.11.3:17189 10.10.10.1:53 > May 8 03:36:02 lab73 /bsd: pf: key setup: UDP wire: (0) 10.10.11.3:17189 > 10.10.10.1:53 stack: (0) - > May 8 03:36:02 lab73 /bsd: pf: key search, out on em2: UDP wire: (0) > 10.10.10.1:53 10.10.11.3:17189 > May 8 03:36:02 lab73 /bsd: pf: pf_map_addr: selected address 10.10.10.2 > May 8 03:36:02 lab73 /bsd: pf: key setup: UDP wire: (0) 10.10.10.1:53 > 10.10.10.2:53081 stack: (0) 10.10.10.1:53 10.10.11.3:17189 > May 8 03:36:02 lab73 /bsd: pf: wire key attach failed on all: UDP out > wire: (0) 10.10.10.1:53 10.10.10.2:53081 1:0 @3, existing: UDP out wire: > (0) 10.10.10.1:53 10.10.10.2:53081 stack: (0) 10.10.10.1:53 10.10.11.3:25217 > 2:1 @3 > May 8 03:36:07 lab73 /bsd: pf: key search, in on vlan5: UDP wire: (0) > 10.10.11.3:17189 10.10.10.1:53 > May 8 03:36:07 lab73 /bsd: pf: key search, out on em2: UDP wire: (0) > 10.10.10.1:53 10.10.11.3:17189 > May 8 03:36:07 lab73 /bsd: pf: pf_map_addr: selected address 10.10.10.2 > May 8 03:36:07 lab73 /bsd: pf: key setup: UDP wire: (0) 10.10.10.1:53 > 10.10.10.2:57351 stack: (0) 10.10.10.1:53 10.10.11.3:17189 > May 8 03:36:07 lab73 /bsd: pf: key search, in on em2: UDP wire: (0) > 10.10.10.1:53 10.10.10.2:57351 > May 8 03:36:07 lab73 /bsd: pf: key search, out on vlan5: UDP wire: (0) > 10.10.11.3:17189 10.10.10.1:53 > > It seems to me (i might be wrong), that system selects source ports > for nat in a way, that makes collisions likely. > this is very odd. source port for NAT is selected in pf_get_sport() function found in pf_lb.c. the relevant part reads as follows: 182 do { 183 key.af = pd->naf; 184 key.proto = pd->proto; 185 key.rdomain = pd->rdomain; ... 203 } else if (low == 0 && high == 0) { 204 key.port[sidx] = pd->nsport; ... 215 } else { 216 u_int32_t tmp; 217 218 if (low > high) { 219 tmp = low; 220 low = high; 221 high = tmp; 222 } 223 /* low < high */ 224 cut = arc4random_uniform(1 + high - low) + low; 225 /* low <= cut <= high */ 226 for (tmp = cut; tmp <= high && tmp <= 0xffff; ++tmp) { 227 key.port[sidx] = htons(tmp); 228 if (pf_find_state_all(&key, dir, NULL) == 229 NULL && !in_baddynamic(tmp, pd->proto)) { 230 *nport = htons(tmp); 231 return (0); 232 } 233 } 234 tmp = cut; 235 for (tmp -= 1; tmp >= low && tmp <= 0xffff; --tmp) { 236 key.port[sidx] = htons(tmp); 237 if (pf_find_state_all(&key, dir, NULL) == 238 NULL && !in_baddynamic(tmp, pd->proto)) { 239 *nport = htons(tmp); 240 return (0); 241 } 242 } 243 } If I understand code in parse.y right, the low is PF_NAT_PROXY_PORT_LOW (50001) and high is PF_NAT_PROXY_PORT_HIGH (65535), therefore port selection process happens in else branch 216 - 243. There is a for loop at lines 235 - 240 which tries really hard to find unused port. The code uses pf_find_state_all() to make sure the selected port is unused in state table. if this code suggests port number then pf_find_state_all() somehow guaranties the port can not be found in table. the odd thing is that pf_state_key_attach() reports that port number exists already in table (is in conflict). both calls: call to pf_get_sport() and call to pf_state_key_attach() happen under PF_LOCK(), so they are atomic nothing else can mess up with table. after walking through the code I suspect we fail to initialize look up for pf_find_state_all() properly, we forgot to compute a toeplitz hash. Can you give a try to diff below to see if it fixes NAT? thanks a lot regards sashan --------8<---------------8<---------------8<------------------8<-------- diff --git a/sys/net/pf_lb.c b/sys/net/pf_lb.c index a3d8d6333cd..b893dd819df 100644 --- a/sys/net/pf_lb.c +++ b/sys/net/pf_lb.c @@ -196,18 +196,24 @@ pf_get_sport(struct pf_pdesc *pd, struct pf_rule *r, /* XXX bug: icmp states dont use the id on both * XXX sides (traceroute -I through nat) */ key.port[sidx] = pd->nsport; + key.hash = pf_pkt_hash(key.af, key.proto, &key.addr[0], + &key.addr[1], key.port[0], key.port[1]); if (pf_find_state_all(&key, dir, NULL) == NULL) { *nport = pd->nsport; return (0); } } else if (low == 0 && high == 0) { key.port[sidx] = pd->nsport; + key.hash = pf_pkt_hash(key.af, key.proto, &key.addr[0], + &key.addr[1], key.port[0], key.port[1]); if (pf_find_state_all(&key, dir, NULL) == NULL) { *nport = pd->nsport; return (0); } } else if (low == high) { key.port[sidx] = htons(low); + key.hash = pf_pkt_hash(key.af, key.proto, &key.addr[0], + &key.addr[1], key.port[0], key.port[1]); if (pf_find_state_all(&key, dir, NULL) == NULL) { *nport = htons(low); return (0); @@ -225,6 +231,9 @@ pf_get_sport(struct pf_pdesc *pd, struct pf_rule *r, /* low <= cut <= high */ for (tmp = cut; tmp <= high && tmp <= 0xffff; ++tmp) { key.port[sidx] = htons(tmp); + key.hash = pf_pkt_hash(key.af, key.proto, + &key.addr[0], &key.addr[1], key.port[0], + key.port[1]); if (pf_find_state_all(&key, dir, NULL) == NULL && !in_baddynamic(tmp, pd->proto)) { *nport = htons(tmp); @@ -234,6 +243,9 @@ pf_get_sport(struct pf_pdesc *pd, struct pf_rule *r, tmp = cut; for (tmp -= 1; tmp >= low && tmp <= 0xffff; --tmp) { key.port[sidx] = htons(tmp); + key.hash = pf_pkt_hash(key.af, key.proto, + &key.addr[0], &key.addr[1], key.port[0], + key.port[1]); if (pf_find_state_all(&key, dir, NULL) == NULL && !in_baddynamic(tmp, pd->proto)) { *nport = htons(tmp); diff --git a/sys/net/pfvar_priv.h b/sys/net/pfvar_priv.h index 741125e447e..147cd7d8f75 100644 --- a/sys/net/pfvar_priv.h +++ b/sys/net/pfvar_priv.h @@ -406,6 +406,9 @@ void pf_state_peer_hton(const struct pf_state_peer *, struct pfsync_state_peer *); void pf_state_peer_ntoh(const struct pfsync_state_peer *, struct pf_state_peer *); +u_int16_t pf_pkt_hash(sa_family_t, uint8_t, + const struct pf_addr *, const struct pf_addr *, + uint16_t, uint16_t); #endif /* _KERNEL */