Am 11.07.2013 11:45, schrieb Mark Janssen:
> I've noticed that the HAProxy processes occasionally jump to 100% cpu
> load, while the load before and after these peaks is only 3-5%, and the
> traffic is also the same as outside of these cpu-peaks.
>
> I saw a thread about this earlier (april/may), which concluded that
> there was a bug, which was fixed in 1.5-dev19. Since we were running
> dev18 and also experiencing this issue, we upgraded to dev19.
>
> However, on dev19 I'm also seeing these cpu-load peaks surface a few
> times per day.
>
> As a precaution, we have configured nbproc to 7 currently, (8-cores in
> these boxes).
>
> I've been able to get some straces on the processes eating 100%, but
> usually they drop back to 4% after I start the strace.
>
> I did see large amounts of sequential epoll_wait calls in the processes
> with 100% cpu load, and not with the other processes.
>
> epoll_wait(0, {}, 200, 0) = 0
> (repeated 10-15 times)
I've recently posted here [1] about haproxy staying at 100% CPU. The
cause of it probably was haproxy relentlessly trying but not succeeding
to make a connection to forward the traffic.
Could this be a similar problem? I.e. that f.ex. your backend is
saturated and momentarily not accepting new connection while there are
new connections comming in to haproxy?
> Haproxy config (edited)
>
> # Defaults Section
> defaults
> mode http
> timeout connect 5000ms
> timeout client 500000ms
> timeout server 500000ms
> option splice-auto
> option forwardfor
> option log-health-checks
>
> # Global Options
> global
> daemon
> maxconn 50000
> log 192.168.99.10:514 <http://192.168.99.10:514> local1 info
> stats socket /var/run/haproxy.sock uid 0 gid 0 mode 0600 level
> admin
> chroot /var/empty/haproxy
> user haproxy
> group haproxy
> nbproc 7
> node HOSTNAME
> spread-checks 5
> listen stats <deleted>
>
> frontend in-10
> bind IPIPIPIP:80 defer-accept
> bind IPIPIPIP:443 ssl crt /etc/haproxy/ssl/CERT.pem defer-accept
> ciphers RC4:HIGH:!aNULL:!MD5
> maxconn 100000
> default_backend backend-10
> log global
> mode http
> option httplog
> option dontlog-normal
> acl SITE-DEAD nbsrv(backend-10) lt 1
> redirect location http://we-are-down.site.tld code 303 if SITE-DEAD
>
> backend backend-10
> balance roundrobin
> option http-server-close
> option httpchk GET /test HTTP/1.0\nHost: site.tld\nConnection:
> close\n\n
> cookie JSESSIONID prefix
> server server1 IPIPIPIP:80 check inter 20000 fall 5 downinter
> 30000 maxconn 2000 cookie 4 weight 1
> server server2 IPIPIPIP:80 check inter 20000 fall 5 downinter
> 30000 maxconn 2000 cookie 5 weight 1
> server server3 IPIPIPIP:80 check inter 20000 fall 5 downinter
> 30000 maxconn 2000 cookie 6 weight 1
> server server4 IPIPIPIP:80 check inter 20000 fall 5 downinter
> 30000 maxconn 2000 cookie 0 weight 1
> appsession JSESSIONID len 64 timeout 3h request-learn mode
> path-parameters
> option redispatch
> option persist
> contimeout 2000
> log global
*t
[1] http://marc.info/?l=haproxy&m=137322331120391&w=2