Hi Bryan,

On Wed, Dec 05, 2012 at 04:22:45PM +0100, Bryan Berry wrote:
> dear friends in haproxy,
> 
> We are seeing  intermittently high CPU usage on many of our haproxy
> instances, when the cpu usage is high it is roughly 60% User and 40%
> System, out of the total 99% of available cpu capacity that haproxy is using

Oh crap, I thought all of these were definitely fixed :-(

> When the cpu usage is not "high", it is down to around 1-2%
> 
> We are seeing at most 35 simultaneous connections

Does this stay that way for a long time ? I mean, could it be something
like a health check not getting a response (eg: just a few seconds) or
does that seem to match your client/server timeout (500s in your case) ?

> I am quite confident that my load balancing of databases using TCP mode is
> causing this problem. I believe so because at one point I split my haproxy
> instance into two separate ones, One handled http and the other tcp for our
> databases. Only haproxy instance handling the databases demonstrated this
> problem.

Don't worry, you didn't necessarily do a mistake, what you observed is a
bug in haproxy and we need to fix it. However it's clearly possible that
your workload triggers it more easily than any other one.

I really appreciate all the detailed information your provided, that's
quite useful.

> from myhaproxy:22002
> 
> General process information
> 
> *pid = *18656 (process #1, nbproc = 1)
> *uptime = *0d 4h40m14s
> *system limits:* memmax = unlimited; ulimit-n = 8259
> *maxsock = *8259; *maxconn = *4096; *maxpipes = *0
> current conns = 42; current pipes = 0/0; conn rate = 0/sec
> Running tasks: 1/97; idle = 47 %

The only running task is the one showing the stats so the issue is an
I/O event that did not wake its task up.

> strace results
> 
> [root@foobar haproxy-1.5-dev14]# strace -c -p $(pidof haproxy)
> Process 14208 attached - interrupt to quit
> ^CProcess 14208 detached
> % time     seconds  usecs/call     calls    errors syscall
> ------ ----------- ----------- --------- --------- ----------------
>  98.46    0.016793           0   4970121           epoll_wait
>   0.79    0.000135           0       598           close
>   0.43    0.000074           0      1728      1153 connect
>   0.13    0.000023           0      1215           epoll_ctl
>   0.08    0.000014           0       578           socket
>   0.05    0.000008           0       956       772 recvfrom
>   0.05    0.000008           0      1872           setsockopt
>   0.00    0.000000           0        40        20 accept
>   0.00    0.000000           0       260        13 sendto
>   0.00    0.000000           0        20           shutdown
>   0.00    0.000000           0       598           fcntl
> ------ ----------- ----------- --------- --------- ----------------
> 100.00    0.017055               *4977986 *     1958 total
> 
> 
> the amount of epoll_wait calls here is insane! Any assistance possible here
> would be much appreciate. I imagine that somehow I have misconfigured my
> haproxy instance.

Could you please add "level admin" on your stats socket, restart and issue
a "show sess all" on the stats socket when the issue happens, and capture
the output. It will help *a lot*. The best way to do it is to redirect it
to a file, for example like this :

   echo "show sess all" | socat stdio /var/run/haproxy.sock > show-sess.out

This output will contain detailed information such as internal addresses.
You can mask them as I don't need them to debug. But the flags etc... are
extremely important.

Thanks!
Willy


Reply via email to