Hi Bryan, On Wed, Dec 05, 2012 at 04:22:45PM +0100, Bryan Berry wrote: > dear friends in haproxy, > > We are seeing intermittently high CPU usage on many of our haproxy > instances, when the cpu usage is high it is roughly 60% User and 40% > System, out of the total 99% of available cpu capacity that haproxy is using
Oh crap, I thought all of these were definitely fixed :-( > When the cpu usage is not "high", it is down to around 1-2% > > We are seeing at most 35 simultaneous connections Does this stay that way for a long time ? I mean, could it be something like a health check not getting a response (eg: just a few seconds) or does that seem to match your client/server timeout (500s in your case) ? > I am quite confident that my load balancing of databases using TCP mode is > causing this problem. I believe so because at one point I split my haproxy > instance into two separate ones, One handled http and the other tcp for our > databases. Only haproxy instance handling the databases demonstrated this > problem. Don't worry, you didn't necessarily do a mistake, what you observed is a bug in haproxy and we need to fix it. However it's clearly possible that your workload triggers it more easily than any other one. I really appreciate all the detailed information your provided, that's quite useful. > from myhaproxy:22002 > > General process information > > *pid = *18656 (process #1, nbproc = 1) > *uptime = *0d 4h40m14s > *system limits:* memmax = unlimited; ulimit-n = 8259 > *maxsock = *8259; *maxconn = *4096; *maxpipes = *0 > current conns = 42; current pipes = 0/0; conn rate = 0/sec > Running tasks: 1/97; idle = 47 % The only running task is the one showing the stats so the issue is an I/O event that did not wake its task up. > strace results > > [root@foobar haproxy-1.5-dev14]# strace -c -p $(pidof haproxy) > Process 14208 attached - interrupt to quit > ^CProcess 14208 detached > % time seconds usecs/call calls errors syscall > ------ ----------- ----------- --------- --------- ---------------- > 98.46 0.016793 0 4970121 epoll_wait > 0.79 0.000135 0 598 close > 0.43 0.000074 0 1728 1153 connect > 0.13 0.000023 0 1215 epoll_ctl > 0.08 0.000014 0 578 socket > 0.05 0.000008 0 956 772 recvfrom > 0.05 0.000008 0 1872 setsockopt > 0.00 0.000000 0 40 20 accept > 0.00 0.000000 0 260 13 sendto > 0.00 0.000000 0 20 shutdown > 0.00 0.000000 0 598 fcntl > ------ ----------- ----------- --------- --------- ---------------- > 100.00 0.017055 *4977986 * 1958 total > > > the amount of epoll_wait calls here is insane! Any assistance possible here > would be much appreciate. I imagine that somehow I have misconfigured my > haproxy instance. Could you please add "level admin" on your stats socket, restart and issue a "show sess all" on the stats socket when the issue happens, and capture the output. It will help *a lot*. The best way to do it is to redirect it to a file, for example like this : echo "show sess all" | socat stdio /var/run/haproxy.sock > show-sess.out This output will contain detailed information such as internal addresses. You can mask them as I don't need them to debug. But the flags etc... are extremely important. Thanks! Willy

