On Wed, Dec 5, 2012 at 10:58 PM, Willy Tarreau <[email protected]> wrote: > Hi Bryan, > > Thanks a lot for your help Willy, I really appreciate. And for haproxy. It is a fantastic tool.
> On Wed, Dec 05, 2012 at 04:22:45PM +0100, Bryan Berry wrote:Does this stay > that way for a long time ? I mean, could it be something > like a health check not getting a response (eg: just a few seconds) or > does that seem to match your client/server timeout (500s in your case) ? > It does stay high, here is a graph of cpu performance over the last 24 hours, the left-hand side are % of CPU time https://docs.google.com/open?id=0BzPvBvLIIq7NV0QtTkliM3Yxenc The high cpu usage doesn't appear to correlate to any HTTP 500 status codes and I wouldn't expect it to since it seems related to the TCP mode proxying of our databases. > Could you please add "level admin" on your stats socket, restart and issue > a "show sess all" on the stats socket when the issue happens, and capture > the output. It will help *a lot*. The best way to do it is to redirect it > to a file, for example like this : > > echo "show sess all" | socat stdio /var/run/haproxy.sock > show-sess.out > done https://docs.google.com/document/d/1A3qEq0RmlAtG-fzKJDbZgB0pvmYJnlUuJ0T2IrpBGGg/edit Here are the IP addresses of the database backend servers. Note they are not the originals but have been munged to protect the innocent. 168.100.2.181, 168.100.2.237, 168.100.2.195, 168.100.2.183 just by playing w/ strace, it looks like the following function is being called over and over again with a value of 0 for wait_time status = epoll_wait(0, {}, 26, 0) Line 133, ev_epoll.c hope this helps! thanks again for your assistance

