I have simplified this testing using a default cfg file (content-sw-sample.cfg).

Basically, it seems like heavily loading HAProxy causes some sort of
networking resource exhaustion on the server.

Box 1:
haproxy

Boxes 2-4:
nginx (set up as one of the backends in the default config)

Running all tests from box 1:
ab -n 10000 -c 100  "box1:81/images/layout/blank.gif"

This test completes in less than 1 second, multiple times in a row.
Eventually, however, it gets stuck. The really odd things, though, are
that when the haproxy tests from Box1 are stuck:

- hitting some of the Box2-4 cluster from Box1 will also fail (but not
necessarily all -- some boxes might still be accessible)
- hitting Boxes 2-4 from any other machine on the network works fine

It's like the route between Box1 and specific machines is getting
stuck, and after a minute or two things get back to normal.








On Thu, Jan 15, 2009 at 10:40 PM, Michael Fortson <[email protected]> wrote:
> Our server (a recent 8-core xeon server) is able to handle a few
> rounds of benchmarks, but then connections just seem to start hanging
> somewhere -- they don't show up in haproxy_stats. At the same time,
> every server listed on the stats page starts going red (even though
> they are clearly fine -- you can hit them directly without problem).
> It does not seem to be memory-related (no bulging swap file at any
> rate).
>
> My config file is as follows:
> ________
> global
>        log 127.0.0.1 local0 notice
>        #log 127.0.0.1 local1 alert
>
>        daemon
>        nbproc 2
>        maxconn 32000
>        ulimit-n 65536
>        # and uid, gid, daemon, maxconn...
>
> defaults
>        # setup logging
>        log global
>        option tcplog
>        option httplog
>        option dontlognull
>
>        # setup statistics pages
>        stats   enable
>        stats uri  /haproxy_stats
>        stats refresh 10
>        stats auth   xxxxxxxxxx:xxxxxxxxxxx
>
>
>        mode            http
>        retries         3
>        option          abortonclose
>        option          redispatch
>
>
>        timeout connect 80s  # time in queue
>        timeout client  30s  # data from client (browser)
>        timeout server  80s  # data from server (mongrel)
>
> frontend rails *:7000
>        # set connection limits
>        maxconn         700
>        default_backend mongrels
>
> backend mongrels
>        fullconn        80
>
>        option httpchk
>        option abortonclose
>        contimeout 4000
>
>        balance roundrobin
>
>        server webapp01-101 10.2.0.3:8101 minconn 1 maxconn 5 check
> inter 2s rise 1 fall 2
>        server webapp01-102 10.2.0.3:8102 minconn 1 maxconn 5 check
> inter 2s rise 1 fall 2
>       <snip> lots more of those over several machines... </snip>
>
> ## this is the service I'm testing against
> frontend webservers *:81
>        mode tcp
>        default_backend nginx
>
> backend nginx
>        mode tcp
>        balance leastconn
>        contimeout 1000
>        server webapp01-http 10.2.0.3:8080 check
>        server webapp02-http 10.2.0.4:8080 check
>        server webapp04-http 10.2.0.17:8080 check
>        server webapp05-http 10.2.0.18:8080 check
> ___________
>
> the test I'm running is:
> ab -n 10000 -c 20  "nowhere.com:81/images/layout/blank.gif"
>
> ... from another machine on the LAN. It runs fine (and pretty fast)
> several times in a row, but then something happens; it'll hang
> somewhere in the test, and none of the requests will show up on the
> haproxy stats page. Sometimes Apache (which feeds into haproxy for its
> balancer for Rails requests) starts showing 503s as well.
>
>
> I'm wondering if I have some bad combination of haproxy configuration,
> or if I should look elsewhere for the cause... the system is running
> FC7.
>
> cheers
>

Reply via email to