Hi Krishna,
On Thu, Mar 09, 2017 at 12:03:19PM +0530, Krishna Kumar (Engineering) wrote:
> Hi Willy,
>
> We use HAProxy as a Forward Proxy (I know this is not the intended
> application for HAProxy) to access outside world from within the DC, and
> this requires setting a source port range for return traffic to reach the
> correct
> box from which a connection was established. On our production boxes, we
> see around 500 "no free ports" errors per day, but this could increase to
> about 120K errors during big sale events. The reason for this is due to
> connect getting a EADDRNOTAVAIL error, since an earlier closed socket
> may be in last-ack state, as it may take some time for the remote server to
> send the final ack.
>
> The attached patch reduces the number of errors by attempting more ports,
> if they are available.
>
> Please review, and let me know if this sounds reasonable to implement.
Well, while the patch looks clean I'm really not convinced it's the correct
approach. Normally you should simply be using the "retries" parameter to
increase the amount of connect retries. There's nothing wrong with setting
it to a really high value if needed. Doesn't it work in your case ?
Also a few other points :
- when the remote server sends the FIN with the last segment, your
connection ends up in CLOSE_WAIT state. Haproxy then closes as
well, sending a FIN and your socket ends up in LAST_ACK waiting
for the server to respond. You may instead ask haproxy to close
with an RST by setting "option nolinger" in the backend. The port
will then always be free locally. The side effect is that if the
RST is lost, the SYN of a new outgoing connection may get an ACK
instead of a SYN-ACK as a reply and will respond to it with an
RST and try again. This will result in all connections working,
some taking slightly longer a time (typically 1 second).
- 500 outgoing ports is a very low value. You should keep in mind
that nowadays most servers use 60 seconds FIN_WAIT/TIME_WAIT
delays (the remote server remains in FIN_WAIT1 while waiting for
your ACK, then enters TIME_WAIT when receiving your FIN). So with
only 500 ports, you can *safely* support only 500/60 = 8 connections
per second. Fortunately in practice it doesn't work like this
since most of the time connections are correctly closed. But if
you start to enter big trouble, you need to understand that you
can very quickly reach some limits. And 500 outgoing ports means
you don't expect to support more than 500 concurrent conns per
proxy, which seems quite low.
Thus normally what you're experiencing should only be dealt with
using configuration :
- increase retries setting
- possibly enable option nolinger (backend only, never on a frontend)
- try to increase the available source port ranges.
Regards,
Willy