Hi dev team,
It's not a secret that when application is trying to establish new TCP
connection, without
first binding a socket to specific local interface address, OS handles that
automatically.
Unfortunately there is a catch, that lies in a different logic of local port
allocation:
(1) when socket is bound before connect() vs (2) when it is not. When
allocating the port
in in_pcb_lport() by checking whether different ports are free, using
in_pcblookup_local(),
the behaviour is following:
(1) Bound, ie laddr is assigned with specific address:
Port is considered occupied only if there is a PCBs that matches both laddr
and lport
(2) Not bound, ie laddr == INADDR_ANY:
Port is considered occupied if there is any PCBs that only matches lport.
What this
means is that in order to allocate a port none of the all available local
addresses
should have it allocated, even though this requirement is ridiculous, since
we are
allocating only one PCB
Looking though the code, it seems that (2) is due to the fact that
tcp_connect() first
allocates the port, indirectly through the call to in_pcbbind() and only then
allocates
the actual local address, also indirectly, though the call to
in_pcbconnect_setup(), that
in turn calls in_pcbladdr(). So, probably, in order to guarantee that
in_pcbconnect_setup()
will not fail we make sure that all range of local addresses are available, no
matter
which one of them is actually selected by in_pcbladdr()?
In real world, this creates serious problems for servers that have a lot of
outgoing
connections, for example nginx proxy with a lot of open HTTP2 connections. In
order to
avoid this limitation we have created workarounds within the nginx config as
well as
within our own software, basically by having 50 local addresses and only
following the
scenario (1). Alas, all of the built-in Unix utilities as well as other
software always
follow scenario (2). As the result given large number of connections there may
be points
in time, when whole range of ports is occupied by at least one local address.
Even worse is
the outcome of such condition: when in_pcb_lport() travels over the range of
possible port
numbers, making myriad of calls to in_pcblookup_local(), some kind of
important lock is
being held withing the kernel. So important that it leads to a complete lock of
the system.
Even the direct terminal access is not available: it is not responsive. The
more calls to
connect through scenario (2) there are the longer it takes the system to
unfreeze. Given
some circumstances, the only option is hard reset.
Is it possible to somehow update the code that does connect via scenario (2) to
enable
more intelligent port allocation, like for example allocating local address and
port simultaneously
_______________________________________________
[email protected] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[email protected]"