I have a server running v1.5-dev17-17-gcf181c9 which "reliably" (after
a few hours) gets into a state where it spends 100% in epoll. It still
works as it should, but calls epoll all the time. I'm pretty sure it
started happening after i adjusted some timeouts and added "observe
layer7".

strace looks like this:

epoll_wait(3, {{EPOLLIN|0x2000, {u32=37, u64=37}}}, 200, 0) = 1
epoll_wait(3, {{EPOLLIN|0x2000, {u32=37, u64=37}}}, 200, 0) = 1
epoll_wait(3, {{EPOLLIN|0x2000, {u32=37, u64=37}}}, 200, 0) = 1
epoll_wait(3, {{EPOLLIN|0x2000, {u32=37, u64=37}}}, 200, 0) = 1
... etc

fd 37 is in CLOSE_WAIT  (from lsof):
haproxy 2809070 ha-user  37u     IPv4         2044761425      0t0
  TCP <haproxy-server>:27137->s3:8082 (CLOSE_WAIT)

I'll try bumping to latest master, but didn't see any commits that should help.

The config (somewhat bowdlerized, but hopefully not too much):

global
        log localhost local2
        maxconn 70000
        user ha-user
        group haproxy
        daemon
        stats socket /var/run/haproxy-blah.socket mode 0600 level admin

defaults
        errorfile 408 /dev/null

        errorfile 500 errors/500-empty.http
        errorfile 502 errors/500-empty.http
        errorfile 503 errors/500-empty.http
        errorfile 504 errors/500-empty.http
        log global
        mode http
        option httplog

        option http-server-close
        option http-pretend-keepalive

        option forwardfor except 127.0.0.0/8
        retries 3
        option redispatch
        balance roundrobin

        timeout connect 4s
        timeout http-request 30s
        timeout http-keep-alive 5s
        timeout client 29s

        timeout server 5s
        timeout queue 10s

frontend myfrontend
        bind <some-address>:80

        maxconn 68000
        timeout http-keep-alive 30s

        capture request header Host len 30
        capture request header Referer len 40
        capture cookie foo len 40

        capture request header X-Forwarded-For len 66

        default_backend mybackend

backend mybackend
        timeout server 12s
        timeout queue 10s
        option httpchk GET /blah/monitor HTTP/1.1\r\nHost:\ some.server.com

        timeout connect 70
        default-server inter 2000 fastinter 350 rise 3 fall 3 maxconn
12 on-error fail-check error-limit 2
        server s1 s1:8082 check weight 200 observe layer7
        server s2 s2:8082 check weight 200 observe layer7
        server s3 s3:8082 check weight 200 observe layer7

        server faraway 127.1.21.1:1337 check maxconn 200 weight 1


- Finn Arne

Reply via email to