Thanks for the response. Can you explain what nbproc is if I am using it incorrectly? My VM shows 4 cores of cpu.
I will run tcpdump in the background for a few hours to try and see if there is major network latency. Thanks! On 6/4/10, Willy Tarreau <[email protected]> wrote: > Hi, > > On Wed, Jun 02, 2010 at 01:19:59AM -0400, Geoffrey Mina wrote: >> Greetings, >> We recently deployed HAProxy in a virtualized environment. I am having >> some >> problems with occasional socket accept errors. > > Where do you observe those errors ? I'm seeing you have "nbproc 4" which you > shouldn't be using. It is possible that you're just observing some of the > processes waking up, performing an accept() which returns -1 EAGAIN and that > you take that for an error. > >> We are seeing the problems >> primarily on the pure TCP load balancing portion of our configuration. >> The >> load balancer is running in the Rackspace Cloud under Xen. Basically what >> I >> am seeing is that sockets never get nailed up, even though I am 110% sure >> all the back-end servers are operating fine. We have secondary monitoring >> processes which are constantly setting up and tearing down sockets >> directly >> (bypassing HAProxy) to ensure that the servers are up and running. I have >> provided our config and some other information below. If anyone can point >> me in the right direction for figuring out this issue, i would greatly >> appreciate it. > > I suppose you have already performed the usual tuning bits (tune or disable > iptables, etc...). > > One thing that can happen in virtualized environments is that the haproxy VM > starves without getting access to the CPU for long periods of time if your > hosting provider sells more power than the physical machines can provide. > It is also possible that network packets get queued up for a very long time > between VMs because they physical network (or even the physical machines) > are > overloaded. I have already observed pings up to 7 seconds on an EC2 platform > that finally migrated to Rackspace to solve such issues, but since it was > more than a year ago, it does not mean they might not experience similar > trouble now :-) > > The most important thing to do in such environments is to sniff traffic in > real time. Since you have zero control over the resource allocation and the > timing, the best you can do is observe if locally initiated I/Os reach their > target in time or not. > > Regards, > Willy > > -- Sent from my mobile device

