Hello Willy,
Thank you for the explanation and suggestions.
I've re-checked logs and connections.
1. There are no TIME_WAIT connections on our server. They may appear for
a very short time, but there are no long-waiting ones. So in that our
system works good.
2. What is connection retry mechanism you mentioned? Is it a haproxy or
a system mechanism?
3. With my re-connect loop the second try always was successful. Does it
mean that without my loop connection retry mechanism will also
successfully re-connect and such log errors may be completely ignored?
4. The main question. If above is right and such error messages are
completely harmless why so such errors are logged here while all other
connect errors aren't? Such logging worries our admins (and me) and so
we started to investigate and try to fix them. May it be better to
remove these log messages, or move it somewhere upper to the point where
connection retry mechanism decides that all reconnect tries are
unsuccessful? Do you have any reasons to leave them just here?
5. We see "Connect() failed...: no free ports." errors 20-70 times per
day (depending on server load). Could you imagine any reasons why such
errors may occur? haproxy has only about 500-700 open connections, there
are no "dead" ones, all are in ESTABLISHED state.
Thank you a lot for your help!
Hello Denis,
On Tue, Feb 04, 2014 at 12:10:05PM +0700, Denis Malyshkin wrote:
Hello all,
We have used haproxy for several months. And periodically see the next
error messages in the log:
============================================================================
Sep 27 16:17:06 localhost haproxy[12874]: Connect() failed for backend
https: no free ports.
============================================================================
I've investigated this issue and found that EADDRNOTAVAIL error is
returned sometimes.
Probably it is caused by the fact that we are using one port from
ephemeral range for our internal needs.
According to http://en.wikipedia.org/wiki/Ephemeral_port 'connect'
function usually just uses Round-Robin algorithm to choice the next
ephemeral port, and so when it encountered already used port -- it just
produces the above error.
Solution for this issue is simple -- add a loop around connect. We have
implemented it and tested on our environment. It works for us. May it
will be good enough to include into the core haproxy...
Logic of the solution is simple -- try to connect 3 times in case of
EAGAIN, EADDRINUSE or EADDRNOTAVAIL errors:
You should not need to do this, this will naturally be handled by the
connection retry mechanism for the number of configured retries. Also,
your method will not work with explicit source port ranges, because it
will insist on reusing the same source, while the retries method will
automatically pick another one.
BTW, if you're seeing this problem, I suspect you're running a bogus
protocol such as Redis where the client closes first, causing the local
ports to remain in TIME_WAIT state for some time and not being reusable.
If this is the case, you should put an "option nolinger" in the backend
section (don't put it in the frontend!). That way it will tell the system
to flush whatever data may remain upon close and will get rid of the
TIME_WAIT. Otherwise, under moderate load, you can end up with no more
free ports at all and your workaround will not work anymore.
Best regards,
Willy
--
Best regards,
Denis Malyshkin,
Senior C++ Developer
of ISS Art, Ltd., Omsk, Russia.
Mobile Phone: +7 913 669 2896
Office tel/fax +7 3812 396959
Yahoo Messenger: dmalyshkin
Web: http://www.issart.com
E-mail: [email protected]