I have simplified this testing using a default cfg file (content-sw-sample.cfg).
Basically, it seems like heavily loading HAProxy causes some sort of networking resource exhaustion on the server. Box 1: haproxy Boxes 2-4: nginx (set up as one of the backends in the default config) Running all tests from box 1: ab -n 10000 -c 100 "box1:81/images/layout/blank.gif" This test completes in less than 1 second, multiple times in a row. Eventually, however, it gets stuck. The really odd things, though, are that when the haproxy tests from Box1 are stuck: - hitting some of the Box2-4 cluster from Box1 will also fail (but not necessarily all -- some boxes might still be accessible) - hitting Boxes 2-4 from any other machine on the network works fine It's like the route between Box1 and specific machines is getting stuck, and after a minute or two things get back to normal. On Thu, Jan 15, 2009 at 10:40 PM, Michael Fortson <[email protected]> wrote: > Our server (a recent 8-core xeon server) is able to handle a few > rounds of benchmarks, but then connections just seem to start hanging > somewhere -- they don't show up in haproxy_stats. At the same time, > every server listed on the stats page starts going red (even though > they are clearly fine -- you can hit them directly without problem). > It does not seem to be memory-related (no bulging swap file at any > rate). > > My config file is as follows: > ________ > global > log 127.0.0.1 local0 notice > #log 127.0.0.1 local1 alert > > daemon > nbproc 2 > maxconn 32000 > ulimit-n 65536 > # and uid, gid, daemon, maxconn... > > defaults > # setup logging > log global > option tcplog > option httplog > option dontlognull > > # setup statistics pages > stats enable > stats uri /haproxy_stats > stats refresh 10 > stats auth xxxxxxxxxx:xxxxxxxxxxx > > > mode http > retries 3 > option abortonclose > option redispatch > > > timeout connect 80s # time in queue > timeout client 30s # data from client (browser) > timeout server 80s # data from server (mongrel) > > frontend rails *:7000 > # set connection limits > maxconn 700 > default_backend mongrels > > backend mongrels > fullconn 80 > > option httpchk > option abortonclose > contimeout 4000 > > balance roundrobin > > server webapp01-101 10.2.0.3:8101 minconn 1 maxconn 5 check > inter 2s rise 1 fall 2 > server webapp01-102 10.2.0.3:8102 minconn 1 maxconn 5 check > inter 2s rise 1 fall 2 > <snip> lots more of those over several machines... </snip> > > ## this is the service I'm testing against > frontend webservers *:81 > mode tcp > default_backend nginx > > backend nginx > mode tcp > balance leastconn > contimeout 1000 > server webapp01-http 10.2.0.3:8080 check > server webapp02-http 10.2.0.4:8080 check > server webapp04-http 10.2.0.17:8080 check > server webapp05-http 10.2.0.18:8080 check > ___________ > > the test I'm running is: > ab -n 10000 -c 20 "nowhere.com:81/images/layout/blank.gif" > > ... from another machine on the LAN. It runs fine (and pretty fast) > several times in a row, but then something happens; it'll hang > somewhere in the test, and none of the requests will show up on the > haproxy stats page. Sometimes Apache (which feeds into haproxy for its > balancer for Rails requests) starts showing 503s as well. > > > I'm wondering if I have some bad combination of haproxy configuration, > or if I should look elsewhere for the cause... the system is running > FC7. > > cheers >

