Cyril, thanks for your response. On Jul 26, 2012, at 4:46 PM, Cyril Bonté wrote:
> > Please add "log global" in your backend sections (or in your defaults), > this explains why your log files didn't give you any indication. > After adding this line everywhere (not only in the frontend), you'll see when > servers go UP and DOWN, and why. > This will also probably help us to know what happens. Done. That helps - we tend to be a bit minimal on most logging for compliance reasons but that's certainly going to help. > From this previous log line, it looks like something becomes slow (your > haproxy server or your backend servers). That's the odd piece, because the server logs don't seem to indicate any issues (and this was last seen shortly after a restart). Hopefully the logging will shed more light on the subject. > Wow, are you sure you really want to use such a big buffer size ? Also, > ensure you're running the last stable version of HAProxy (currently 1.4.21), > which fixes a major bug when using a larger buffer size (it doesn't explain > what you observe but it's an advise for more stability). Unfortunately yes; we are supporting some rare but critical very large HTTP GET requests over a REST API. I'll look into upgrading shortly. > For more details : > http://haproxy.1wt.eu/git?p=haproxy-1.4.git;a=commit;h=30297cb17147a8d339eb160226bcc08c91d9530b Good to know! > As said at the beginning, please add : > log global Added. Is there a useful log level configuration that outputs exactly what the defaults are without the stats connection lines? > If using cookies is not an issue for your clients, I'd recommend you not to > use "appsession" but "cookie insert" or "cookie prefix" instead. > > http://cbonte.github.com/haproxy-dconv/configuration-1.4.html#4-cookie We'll probably leave this one just because I don't believe its giving us any issues at the moment; the client problem definitely coincided with the backend being marked as DOWN rather than a cookie issue (the cookie issue was my first guess to be honest). >> server gui1 172.25.200.53:8080 check maxconn 2000 >> server gui2 172.25.200.54:8080 check maxconn 2000 > > You didn't provide any "timeout check" nor "inter" value. > The default will be 2 seconds, which is maybe too low for your case. It shouldn't be - our healthcheck page is fairly simple and just basically makes sure that our webapp is responding to requests (barely more than a static file) - I've upped "timeout check" to 10000 though, so we'll see if that makes a difference. > Hope this helps. It did, and thank you for looking at this. I've learned an awful lot about haproxy configuration setups (good and bad) from this list! -Richard