Thanks! This is an excerpt of the haproxy conf, does it look OK ? Will HAProxy set ulimit for the haproxy user? (ulimit-n). How can I tell if root could actually set the specified ulimit ? Why I am asking is since we had issues where we needed to create scripts where root needed to set ulimit manually and the issuing su $user -p $command to specify the ulimit for a certain user. Changing /etc/security/limits.conf do not seem to affect this, at least not without a reboot.
global log 127.0.0.1 local0 notice user haproxy group haproxy maxconn 32000 ulimit-n 262144 defaults log global mode http option httplog option dontlognull retries 3 option redispatch maxconn 2000 contimeout 5000 clitimeout 50000 srvtimeout 50000 .... I have these settings in /etc/sysctl.conf net.ipv4.tcp_syncookies = 1 net.ipv4.tcp_max_syn_backlog = 262144 net.core.somaxconn = 262144 I attach the sysctl.conf for completeness, I am sure it contains lots if stupid config rows since it is very much copy'pasted, but I've tried to go through each setting to understand what it affects. About the swap, yeah the machine got out of memory due to that an auto-restart script started to many java-processes. Cheers and happy new year! //Marcus On Thu, Dec 31, 2009 at 3:15 PM, Willy Tarreau <w...@1wt.eu> wrote: > Hi Marcus, > > On Wed, Dec 30, 2009 at 02:04:26PM +0100, Marcus Herou wrote: > > Hi Willy, thanks for your answer it got filtered, that's why I missed it > for > > two weeks. > > No problem, it happens to me too from time to time. > > > Let's start with describing the service. > > > > We are hosting javascripts of the sizes up to 20K and serve flash and > image > > banners as well which of course are larger. That is basically it.. Ad > > Serving. > > > > On the LB's we have about 2MByte/s per LB = 2x2MByte/s = 4MByte/s > ~30MBit/s > > at peak, that is not the issue. > > OK so you're running at approx 100 hits/s per LB on average. > > > I've created a little script which parse the "active connections" from > the > > HAProxy stat interface and plots it into Cacti, it peaks at 100 (2x100) > > connections per machine which is very little in your world I guess. > > It's not "little", I'd even say it's in the average, as most sites are > running at very low rates. > > > I've attached a plot of tcp-connections as well. Nothing fancy there > either > > besides that the number of TIME_WAIT sockets are in the 10000 range (log > > scale) > > 10000 TIME_WAIT with a default setting of 60 seconds means 160 sessions per > second on average. That's still very reasonable and does not require > particular tuning. > > > Here's the problem: > > > > Everyother day I receive alarms from Pingdom that the service is not > > available and if I watch the syslog I get at about the same timings hints > > about possible SYN flood. At the same timings we receive emails from > sites > > using us that our service is damn slow. > > > > What I feel is that we get "hickups" on the LB's somehow and that > requests > > get queued. If I count the number of rows in the access logs on the > machines > > behind the LB it decreases at the same timings and with the same factor > on > > each machine (perhaps 10-20%) leading me to think that the narrow point > is > > not on the backend side. > > this means to me that : > 1) your SYN backlog is too short. It defaults to 128 packets per socket on > Linux (min of net.core.somaxconn and net.ipv4.tcp_max_syn_backlog). So > you need to increase them (around 10000 for both always gives me good > results. > 2) you may be experiencing SYN flood attacks from time to time. > 3) you have not enabled SYN cookies, which can protect against such issues > especially during SYN attacks. You can enable them with > net.ipv4.tcp_syn_cookies. > > If you don't get any attack, #1 should be enough, but #3 is a good > complement > that acts once #1 is not enough anymore. > > It is also possible that you have not enabled enough connections in haproxy > and that the port is saturated for a long time. But this will still be > triggered > by #1 above. This can be monitored on haproxy's stats page (limit and max > on the > frontend's sessions). > > > A little more about the backend servers: > > > > We have an ad publishing system which pushes data to the web-servers > > enabling them to act almost 100% static, this have been the key thing > which > > I tuned some years ago. Initially every request went to a DB but now just > a > > simple Hashtable which is replicated from a "master". > > > > The backend servers have very little to do and consumes very little > > resources: > > Example: > > top - 11:34:23 up 366 days, 1:15, 1 user, load average: 0.37, 0.25, > 0.23 > > Tasks: 79 total, 1 running, 78 sleeping, 0 stopped, 0 zombie > > Cpu(s): 0.8%us, 0.5%sy, 0.0%ni, 94.0%id, 4.6%wa, 0.0%hi, 0.2%si, > > 0.0%st > > Mem: 4052904k total, 4008696k used, 44208k free, 292932k buffers > > Swap: 3903784k total, 9240k used, 3894544k free, 2145340k cached > > You should be careful, this one has swapped at least once. It's very nasty > to use swap on any web server, as this considerably increases response > times. > > Regards, > Willy > > -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/
sysctl.conf
Description: Binary data