Hi, Just a few translation Linux -> FreeBSD. As pfSense is FreeBSD based.
2015-10-04 10:56 GMT+02:00 Willy Tarreau <[email protected]>: > On Sat, Oct 03, 2015 at 12:55:33AM -0700, Daren Sefcik wrote: >> > Is there some kernel messages >> > Load, swap usage, disk space >> > >> again, according to my limited know how, top and other built in utilities >> all report the system is barely doing anything and there is tons of memory >> and disk space > > Just run "free" after a test and "vmstat 1 10" during a test. > >> > During stress : >> > Is there more sys/interrupt than user cpu usage >> > Link saturation >> > Packet lost >> > >> I am not sure how to check this, I will try and figure this out but if you >> have any advice that would be appreciated. >> The LAN interface is a bonded interface with (3) 1000mb NIC cards so I am >> doubtful it is being saturated from this simple apache bench test. >> Here is what the Interfaces status shows me: >> >> *Status up* >> MTU 1500 >> Media autoselect >> LAGG Protocol lacp lagghash l2,l3,l4 > > That's interesting. Keep in mind that different aggregation algorithms > exist, and that hashing on l2+l3+l4 will spread different connections to > different ports. As long as you have enough connections (which seems to > be your case) your traffic should be evenly spread. But on low connections > it can happen that you saturate one link without traffic on the other ones. > So for now let's consider this not a problem. > >> LAGG Ports bge3 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING> >> bge2 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING> >> bge1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING> >> In/out packets 248989670/305051696 (77.73 GB/88.68 GB) >> In/out packets (pass) 248989670/305051696 (77.73 GB/88.68 GB) >> In/out packets (block) 4130394/147 (4.75 GB/70 KB) >> In/out errors 0/608 >> Collisions 0 sysctl net.link.lagg.lacp.debug=1 should provide some interesting informations. Broadcom NICs : you should check man 4 bge and https://doc.pfsense.org/index.php/Tuning_and_Troubleshooting_Network_Cards >> >> Suboptimal firewall rules : replay stress packet filter unloaded. >> > >> There are only two simple allow firewall rules for LAN access, nothing >> complicated at all. > > No but very likely you're running with conntrack. If it's not properly > tuned you can quickly end up with a conntrack table full. Please run > "lsmod" to see the load modules, and "dmesg | grep -i conntrack" to > see if any such message has appeared, as well as "dmesg | grep -i drop" > to see if the kernel complained it was forced to drop anything. The > best thing to try to be sure is to unload all firewall modules, > especially conntrack. I had a look to pfsense kernel config it seems that pf, pflog, pfsync and all netgraph and altq stuff are not build as a loadable modules (the output of kldstat should confirm that). So you can't unload them. This is not ideal has simply loading a module could enable some features in the network stack. You can first test with pfctl -d to disable pf (better to have console access to do those things). Make also sure you don't have some QOS enable. grep kernel /var/log/messages to see if something is logged by kernel (or dmesg output). You could also check tcp states evolution during the test (with a bourn shell) : clear; while : ; do netstat -anp tcp |awk '$6 ~ /^[A-Z]/ && $6 !~/Foreign|LISTEN/{print $6}' | sort |uniq -c |sort -g ; sleep 2 ; clear; done or with csh clear ; while ( 1 == 1 ) netstat -anp tcp |awk '$6 ~ /^[A-Z]/ && $6 !~/Foreign|LISTEN/{print $6}' | sort |uniq -c |sort -g ; sleep 2 ; clear end Best regards Joris > >> I am really stumped by this problem and am hoping you guys can help me get >> this figured out. If there are any commands I can run to get info that >> would be helpful please let me know. > > In general the situation you describe is observed in a few cases : > - too low file descriptor limits. A non-root user is limited to 1024 > hence about 512 end-to-end connections, but I'm assuming you started > haproxy as a root user to get enough connections ; > > - improperly tuned firewall : this is the most common case. Each end > to end connection uses two conntrack entries, one from the client > to the proxy and one from the proxy to the server. Connections remain > for some time after they are closed due to the TIME_WAIT state and add > to the count. > > - communications in virtualized environments being limited by improper > configuration of the hypervisor. We've got a number of reports, some > even public on the list here where hosting providers were unable to > configure their hypervisor to stand at least the load of a single VM, > so packets were dropped by the hypervisor. > > - bogus NIC firmware. We used to face this situation for a few years > about 5 years ago, some NICs (netxtreme 2 found on a lot of Proliant > servers) were losing up to 12% of the packets, so I let you imagine > how TCP performed... We haven't had such a report for the last 2 years > so I consider this issue fixed by now. > > HAProxy's logs are designed to find exactly what is happening and to dig > into the problem. So you'll have to post some logs so that we can see if > there are connection retries, long connection times, or maybe almost no > request received (in case it blocks upfront). > > Hoping this helps, > Willy >

