Hi

Thank you for the diff. I tested this on my lab environment with amd64 SP
and MP kernels and could not reproduce the issue anymore. I'll deploy this
to production as well and report back. This takes few days though.

Kind regards
Joosep

On Tue, May 9, 2023 at 1:15 AM Alexandr Nedvedicky <sas...@fastmail.net>
wrote:

> Hello,
>
> thank you for detailed bug report. It clearly shows there is something
> odd.
>
> On Mon, May 08, 2023 at 01:30:24AM +0300, Joosep wrote:
>
> </snip>
>
> > When looking through the pf debug-level logs, 2 events stand up - first
> one
> > has no problems, second one fails.Note that source port 53081 for nat
> exist
> > in
> > 2 different events (this particular test run was 66 dns requests long
> > on an otherwise idle system):
> > May  8 03:35:53 lab73 /bsd: pf: key search, in on vlan5: UDP wire: (0)
> > 10.10.11.3:25217 10.10.10.1:53
> > May  8 03:35:53 lab73 /bsd: pf: key setup: UDP wire: (0)
> 10.10.11.3:25217
> > 10.10.10.1:53 stack: (0) -
> > May  8 03:35:53 lab73 /bsd: pf: key search, out on em2: UDP wire: (0)
> > 10.10.10.1:53 10.10.11.3:25217
> > May  8 03:35:53 lab73 /bsd: pf: pf_map_addr: selected address 10.10.10.2
> > May  8 03:35:53 lab73 /bsd: pf: key setup: UDP wire: (0) 10.10.10.1:53
> > 10.10.10.2:53081 stack: (0) 10.10.10.1:53 10.10.11.3:25217
> > May  8 03:35:53 lab73 /bsd: pf: key search, in on em2: UDP wire: (0)
> > 10.10.10.1:53 10.10.10.2:53081
> > May  8 03:35:53 lab73 /bsd: pf: key search, out on vlan5: UDP wire: (0)
> > 10.10.11.3:25217 10.10.10.1:53
> > .....
> > May  8 03:36:02 lab73 /bsd: pf: key search, in on vlan5: UDP wire: (0)
> > 10.10.11.3:17189 10.10.10.1:53
> > May  8 03:36:02 lab73 /bsd: pf: key setup: UDP wire: (0)
> 10.10.11.3:17189
> > 10.10.10.1:53 stack: (0) -
> > May  8 03:36:02 lab73 /bsd: pf: key search, out on em2: UDP wire: (0)
> > 10.10.10.1:53 10.10.11.3:17189
> > May  8 03:36:02 lab73 /bsd: pf: pf_map_addr: selected address 10.10.10.2
> > May  8 03:36:02 lab73 /bsd: pf: key setup: UDP wire: (0) 10.10.10.1:53
> > 10.10.10.2:53081 stack: (0) 10.10.10.1:53 10.10.11.3:17189
> > May  8 03:36:02 lab73 /bsd: pf: wire key attach failed on all: UDP out
> > wire: (0) 10.10.10.1:53 10.10.10.2:53081 1:0 @3, existing: UDP out wire:
> > (0) 10.10.10.1:53 10.10.10.2:53081 stack: (0) 10.10.10.1:53
> 10.10.11.3:25217
> > 2:1 @3
> > May  8 03:36:07 lab73 /bsd: pf: key search, in on vlan5: UDP wire: (0)
> > 10.10.11.3:17189 10.10.10.1:53
> > May  8 03:36:07 lab73 /bsd: pf: key search, out on em2: UDP wire: (0)
> > 10.10.10.1:53 10.10.11.3:17189
> > May  8 03:36:07 lab73 /bsd: pf: pf_map_addr: selected address 10.10.10.2
> > May  8 03:36:07 lab73 /bsd: pf: key setup: UDP wire: (0) 10.10.10.1:53
> > 10.10.10.2:57351 stack: (0) 10.10.10.1:53 10.10.11.3:17189
> > May  8 03:36:07 lab73 /bsd: pf: key search, in on em2: UDP wire: (0)
> > 10.10.10.1:53 10.10.10.2:57351
> > May  8 03:36:07 lab73 /bsd: pf: key search, out on vlan5: UDP wire: (0)
> > 10.10.11.3:17189 10.10.10.1:53
> >
> > It seems to me (i might be wrong), that system selects source ports
> > for nat in a way, that makes collisions likely.
> >
>
> this is very odd. source port for NAT is selected in pf_get_sport()
> function found in pf_lb.c. the relevant part reads as follows:
>
> 182         do {
> 183                 key.af = pd->naf;
> 184                 key.proto = pd->proto;
> 185                 key.rdomain = pd->rdomain;
> ...
> 203                 } else if (low == 0 && high == 0) {
> 204                         key.port[sidx] = pd->nsport;
> ...
> 215                 } else {
> 216                         u_int32_t tmp;
> 217
> 218                         if (low > high) {
> 219                                 tmp = low;
> 220                                 low = high;
> 221                                 high = tmp;
> 222                         }
> 223                         /* low < high */
> 224                         cut = arc4random_uniform(1 + high - low) + low;
> 225                         /* low <= cut <= high */
> 226                         for (tmp = cut; tmp <= high && tmp <= 0xffff;
> ++tmp) {
> 227                                 key.port[sidx] = htons(tmp);
> 228                                 if (pf_find_state_all(&key, dir, NULL)
> ==
> 229                                     NULL && !in_baddynamic(tmp,
> pd->proto)) {
> 230                                         *nport = htons(tmp);
> 231                                         return (0);
> 232                                 }
> 233                         }
> 234                         tmp = cut;
> 235                         for (tmp -= 1; tmp >= low && tmp <= 0xffff;
> --tmp) {
> 236                                 key.port[sidx] = htons(tmp);
> 237                                 if (pf_find_state_all(&key, dir, NULL)
> ==
> 238                                     NULL && !in_baddynamic(tmp,
> pd->proto)) {
> 239                                         *nport = htons(tmp);
> 240                                         return (0);
> 241                                 }
> 242                         }
> 243                 }
>
>     If I understand code in parse.y right, the low is
> PF_NAT_PROXY_PORT_LOW (50001)
>     and high is PF_NAT_PROXY_PORT_HIGH (65535), therefore port selection
> process
>     happens in else branch 216 - 243.  There is a for loop at lines 235 -
> 240
>     which tries really hard to find unused port. The code uses
> pf_find_state_all()
>     to make sure the selected port is unused in state table.
>
>     if this code suggests port number then pf_find_state_all() somehow
> guaranties
>     the port can not be found in table. the odd thing is that
> pf_state_key_attach()
>     reports that port number exists already in table (is in conflict).
>
>     both calls: call to pf_get_sport() and call to pf_state_key_attach()
> happen
>     under PF_LOCK(), so they are atomic nothing else can mess up with
> table.
>
>     after walking through the code I suspect we fail to initialize look up
> for
>     pf_find_state_all() properly, we forgot to compute a toeplitz hash.
> Can you
>     give a try to diff below to see if it fixes NAT?
>
> thanks a lot
> regards
> sashan
>
> --------8<---------------8<---------------8<------------------8<--------
>
> diff --git a/sys/net/pf_lb.c b/sys/net/pf_lb.c
> index a3d8d6333cd..b893dd819df 100644
> --- a/sys/net/pf_lb.c
> +++ b/sys/net/pf_lb.c
> @@ -196,18 +196,24 @@ pf_get_sport(struct pf_pdesc *pd, struct pf_rule *r,
>                         /* XXX bug: icmp states dont use the id on both
>                          * XXX sides (traceroute -I through nat) */
>                         key.port[sidx] = pd->nsport;
> +                       key.hash = pf_pkt_hash(key.af, key.proto,
> &key.addr[0],
> +                           &key.addr[1], key.port[0], key.port[1]);
>                         if (pf_find_state_all(&key, dir, NULL) == NULL) {
>                                 *nport = pd->nsport;
>                                 return (0);
>                         }
>                 } else if (low == 0 && high == 0) {
>                         key.port[sidx] = pd->nsport;
> +                       key.hash = pf_pkt_hash(key.af, key.proto,
> &key.addr[0],
> +                           &key.addr[1], key.port[0], key.port[1]);
>                         if (pf_find_state_all(&key, dir, NULL) == NULL) {
>                                 *nport = pd->nsport;
>                                 return (0);
>                         }
>                 } else if (low == high) {
>                         key.port[sidx] = htons(low);
> +                       key.hash = pf_pkt_hash(key.af, key.proto,
> &key.addr[0],
> +                           &key.addr[1], key.port[0], key.port[1]);
>                         if (pf_find_state_all(&key, dir, NULL) == NULL) {
>                                 *nport = htons(low);
>                                 return (0);
> @@ -225,6 +231,9 @@ pf_get_sport(struct pf_pdesc *pd, struct pf_rule *r,
>                         /* low <= cut <= high */
>                         for (tmp = cut; tmp <= high && tmp <= 0xffff;
> ++tmp) {
>                                 key.port[sidx] = htons(tmp);
> +                               key.hash = pf_pkt_hash(key.af, key.proto,
> +                                   &key.addr[0], &key.addr[1],
> key.port[0],
> +                                   key.port[1]);
>                                 if (pf_find_state_all(&key, dir, NULL) ==
>                                     NULL && !in_baddynamic(tmp,
> pd->proto)) {
>                                         *nport = htons(tmp);
> @@ -234,6 +243,9 @@ pf_get_sport(struct pf_pdesc *pd, struct pf_rule *r,
>                         tmp = cut;
>                         for (tmp -= 1; tmp >= low && tmp <= 0xffff; --tmp)
> {
>                                 key.port[sidx] = htons(tmp);
> +                               key.hash = pf_pkt_hash(key.af, key.proto,
> +                                   &key.addr[0], &key.addr[1],
> key.port[0],
> +                                   key.port[1]);
>                                 if (pf_find_state_all(&key, dir, NULL) ==
>                                     NULL && !in_baddynamic(tmp,
> pd->proto)) {
>                                         *nport = htons(tmp);
> diff --git a/sys/net/pfvar_priv.h b/sys/net/pfvar_priv.h
> index 741125e447e..147cd7d8f75 100644
> --- a/sys/net/pfvar_priv.h
> +++ b/sys/net/pfvar_priv.h
> @@ -406,6 +406,9 @@ void                        pf_state_peer_hton(const
> struct pf_state_peer *,
>                             struct pfsync_state_peer *);
>  void                   pf_state_peer_ntoh(const struct pfsync_state_peer
> *,
>                             struct pf_state_peer *);
> +u_int16_t              pf_pkt_hash(sa_family_t, uint8_t,
> +                           const struct pf_addr *, const struct pf_addr *,
> +                           uint16_t, uint16_t);
>
>  #endif /* _KERNEL */
>
>

Reply via email to