Re: Fix for rare EADDRNOTAVAIL error

Denis Malyshkin Wed, 05 Feb 2014 21:37:17 -0800

Hello Willy,

Thank you for the explanation and suggestions.
I've re-checked logs and connections.

1. There are no TIME_WAIT connections on our server. They may appear fora very short time, but there are no long-waiting ones. So in that oursystem works good.

2. What is connection retry mechanism you mentioned? Is it a haproxy ora system mechanism?

3. With my re-connect loop the second try always was successful. Does itmean that without my loop connection retry mechanism will alsosuccessfully re-connect and such log errors may be completely ignored?

4. The main question. If above is right and such error messages arecompletely harmless why so such errors are logged here while all otherconnect errors aren't? Such logging worries our admins (and me) and sowe started to investigate and try to fix them. May it be better toremove these log messages, or move it somewhere upper to the point whereconnection retry mechanism decides that all reconnect tries areunsuccessful? Do you have any reasons to leave them just here?

5. We see "Connect() failed...: no free ports." errors 20-70 times perday (depending on server load). Could you imagine any reasons why sucherrors may occur? haproxy has only about 500-700 open connections, thereare no "dead" ones, all are in ESTABLISHED state.


Thank you a lot for your help!

Hello Denis,

On Tue, Feb 04, 2014 at 12:10:05PM +0700, Denis Malyshkin wrote:
Hello all,
We have used haproxy for several months. And periodically see the nexterror messages in the log:
============================================================================
Sep 27 16:17:06 localhost haproxy[12874]: Connect() failed for backendhttps: no free ports.
============================================================================
I've investigated this issue and found that EADDRNOTAVAIL error isreturned sometimes.Probably it is caused by the fact that we are using one port fromephemeral range for our internal needs.According to http://en.wikipedia.org/wiki/Ephemeral_port 'connect'function usually just uses Round-Robin algorithm to choice the nextephemeral port, and so when it encountered already used port -- it justproduces the above error.
Solution for this issue is simple -- add a loop around connect. We haveimplemented it and tested on our environment. It works for us. May itwill be good enough to include into the core haproxy...
Logic of the solution is simple -- try to connect 3 times in case ofEAGAIN, EADDRINUSE or EADDRNOTAVAIL errors:
You should not need to do this, this will naturally be handled by the
connection retry mechanism for the number of configured retries. Also,
your method will not work with explicit source port ranges, because it
will insist on reusing the same source, while the retries method will
automatically pick another one.

BTW, if you're seeing this problem, I suspect you're running a bogus
protocol such as Redis where the client closes first, causing the local
ports to remain in TIME_WAIT state for some time and not being reusable.
If this is the case, you should put an "option nolinger" in the backend
section (don't put it in the frontend!). That way it will tell the system
to flush whatever data may remain upon close and will get rid of the
TIME_WAIT. Otherwise, under moderate load, you can end up with no more
free ports at all and your workaround will not work anymore.

Best regards,
Willy



--
Best regards,
 Denis Malyshkin,
Senior C++ Developer
of ISS Art, Ltd., Omsk, Russia.
Mobile Phone: +7 913 669 2896
Office tel/fax +7 3812 396959
Yahoo Messenger: dmalyshkin
Web: http://www.issart.com
E-mail: [email protected]

Re: Fix for rare EADDRNOTAVAIL error

Reply via email to