On Sat, Oct 03, 2015 at 12:55:33AM -0700, Daren Sefcik wrote:
> > Is there some kernel messages
> > Load, swap usage, disk space
> >
> again, according to my limited know how, top and other built in utilities
> all report the system is barely doing anything and there is tons of memory
> and disk space

Just run "free" after a test and "vmstat 1 10" during a test.

> > During stress :
> > Is there more sys/interrupt than user cpu usage
> > Link saturation
> > Packet lost
> >
> I am not sure how to check this, I will try and figure this out but if you
> have any advice that would be appreciated.
> The LAN interface is a bonded interface with (3) 1000mb NIC cards so I am
> doubtful it is being saturated from this simple apache bench test.
> Here is what the Interfaces status shows me:
> 
> *Status up*
> MTU 1500
> Media autoselect
> LAGG Protocol lacp lagghash l2,l3,l4

That's interesting. Keep in mind that different aggregation algorithms
exist, and that hashing on l2+l3+l4 will spread different connections to
different ports. As long as you have enough connections (which seems to
be your case) your traffic should be evenly spread. But on low connections
it can happen that you saturate one link without traffic on the other ones.
So for now let's consider this not a problem.

> LAGG Ports bge3 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
> bge2 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
> bge1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
> In/out packets 248989670/305051696 (77.73 GB/88.68 GB)
> In/out packets (pass) 248989670/305051696 (77.73 GB/88.68 GB)
> In/out packets (block) 4130394/147 (4.75 GB/70 KB)
> In/out errors 0/608
> Collisions 0
> 
> Suboptimal firewall rules : replay stress packet filter unloaded.
> >
> There are only two simple allow firewall rules for LAN access, nothing
> complicated at all.

No but very likely you're running with conntrack. If it's not properly
tuned you can quickly end up with a conntrack table full. Please run
"lsmod" to see the load modules, and "dmesg | grep -i conntrack" to
see if any such message has appeared, as well as "dmesg | grep -i drop"
to see if the kernel complained it was forced to drop anything. The
best thing to try to be sure is to unload all firewall modules,
especially conntrack.

> I am really stumped by this problem and am hoping you guys can help me get
> this figured out. If there are any commands I can run to get info that
> would be helpful please let me know.

In general the situation you describe is observed in a few cases :
  - too low file descriptor limits. A non-root user is limited to 1024
    hence about 512 end-to-end connections, but I'm assuming you started
    haproxy as a root user to get enough connections ;

  - improperly tuned firewall : this is the most common case. Each end
    to end connection uses two conntrack entries, one from the client
    to the proxy and one from the proxy to the server. Connections remain
    for some time after they are closed due to the TIME_WAIT state and add
    to the count.

  - communications in virtualized environments being limited by improper
    configuration of the hypervisor. We've got a number of reports, some
    even public on the list here where hosting providers were unable to
    configure their hypervisor to stand at least the load of a single VM,
    so packets were dropped by the hypervisor.

  - bogus NIC firmware. We used to face this situation for a few years
    about 5 years ago, some NICs (netxtreme 2 found on a lot of Proliant
    servers) were losing up to 12% of the packets, so I let you imagine
    how TCP performed... We haven't had such a report for the last 2 years
    so I consider this issue fixed by now.

HAProxy's logs are designed to find exactly what is happening and to dig
into the problem. So you'll have to post some logs so that we can see if
there are connection retries, long connection times, or maybe almost no
request received (in case it blocks upfront).

Hoping this helps,
Willy


Reply via email to