Hello,

 

This is a relatively new setup (under a week), but had problems yesterday as
load increased.

 

This problem was reproducible in both the latest 1.2 (tried 1.2 after
problems) and 1.3.15.7.  With large values for timeout for clitimeout (also
srvrtimeout, but to a lesser extent), I ran into some problems.

 

The server would close the connection (and end up in CLOSE_WAIT on the
haproxy machine I think), and still be counted as a connection for a long
time.  Would be fully closed on the actual servers.  

 

That caused the number of tracked connections to run high, but that's not
really the problem.  the main problem is.  after awhile under heave load,
under errors, conn for the backend line it would increase but  none of the
servers would show any increase in errors or warnings, and none of the
sessions for servers or backend were at the max.  Connection failure rate
would start slow and quickly speed up.  Also noticed higher latency prior to
failure, and the CPU seems to go to 100% at the same time instead of the
normally only a few %.

 

It's as if I was reaching some limit, but from what I could tell no limit
was being reached.

 

After much head scratching and several failures, I set both clitimeout and
srvrtimeout to 24000, which is closer to what the sever will close after
being idle (previously clitimeout was 30min).  Just curious what limit I may
have reached, and why it didn't seem to log it decently anywhere..  Cur
connection are probably at about 10% of what I was seeing before adjusting
the timeouts down, but nothing in the log, etc.

 

I would expect if the server side closes, haproxy would close it's client
side, and so wouldn't hurt to have an extra long timeout.  Obviously that's
not the case.  I am working ok for now, but am a little concerned about the
backend errors where it didn't try a server and didn't log anywhere as to
why and AFAIK didn't reach any limits.

 

 

 

 

Reply via email to