Re: [PATCH] [MEDIUM] Improve "no free ports" error case

Willy Tarreau Thu, 09 Mar 2017 00:08:39 -0800

On Thu, Mar 09, 2017 at 12:50:16PM +0530, Krishna Kumar (Engineering) wrote:
> 1. About 'retries', I am not sure if it works for connect() failing
> synchronously on the
> local system (as opposed to getting a timeout/refused via callback).


Yes it normally does. I've been using it for the same purpose in certain
situations (eg: binding to a source port range while some daemons are
later bound into that range).

> The
> document
> on retries says:
> 
> "   <value>   is the number of times a connection attempt should be retried
> on
>               a server when a connection either is refused or times out. The
>               default value is 3.
> "
> 
> The two conditions above don't fall in our use case.

It's still a refused connection :-)

> The way I understood was that
> retries happens during the callback handler. Also I am not sure if there is
> any way to circumvent the "1 second" gap for a retry.

Hmmm I have to check. In fact when the LB algorithm is not determinist
we immediately retry on another server. If we're supposed to end up only
on the same server we indeed apply the delay. But if it's a synchronous
error, I don't know. And I think it's one case (especially -EADDRNOTAVAIL)
where we should immediately retry.

> 2. For nolinger, it was not recommended in the document,

It's indeed strongly recommended against, mainly because we've started
to see it in configs copy-pasted from blogs without understanding the
impacts.

> and also I wonder if any data
> loss can happen if the socket is not lingered for some time beyond the FIN
> packet that
> the remote server sent for doing the close(), delayed data packets, etc.

The data loss happens only with outgoing data, so for HTTP it's data
sent to the client which are at risk. Data coming from the server are
properly consumed. In fact, when you configure "http-server-close",
the nolinger is automatically enabled in your back so that haproxy
can close the server connection without accumulating time-waits.

> 3. Ports: Actually each HAProxy process has 400 ports limitation to a
> single backend,
> and there are many haproxy processes on this and other servers. The ports
> are split per
> process and per system. E.g. system1 has 'n' processes and each have a
> separate port
> range from each other, system2 has 'n' processes and a completely different
> port range.
> For infra reasons, we are restricting the total port range. The unique
> ports for different
> haproxy processes running on same system is to avoid attempting to use the
> same port
> (first port# in the range) by two processes and failing in connect, when
> attempting to
> connect to the same remote server. Hope I explained that clearly.

Yep I clearly see the use case. That's one of the rare cases where it's
interesting to use SNAT between your haproxy nodes and the internet. This
way you'll use a unified ports pool for all your nodes and will not have
to reserve port ranges per system and per process. Each process will then
share the system's local source ports, and each system will have a different
address. Then the SNAT will convert these IP1..N:port1..N to the public IP
address and an available port. This will offer you more flexibility to add
or remove nodes/processes etc. Maybe your total traffic cannot pass through
a single SNAT box though in which case I understand that you don't have
much choice. However you could then at least not force each process' port
range and instead fix the system's local port range so that you know that
all processes of a single machine share a same port range. That's already
better because you won't be forcing to assign ports from unfinished
connections.

Willy

Re: [PATCH] [MEDIUM] Improve "no free ports" error case

Reply via email to