Thanks!

This is an excerpt of the haproxy conf, does it look OK ? Will HAProxy set
ulimit for the haproxy user? (ulimit-n). How can I tell if root could
actually set the specified ulimit ? Why I am asking is since we had issues
where we needed to create scripts where root needed to set ulimit manually
and the issuing su $user -p $command to specify the ulimit for a certain
user. Changing /etc/security/limits.conf do not seem to affect this, at
least not without a reboot.

global
        log 127.0.0.1   local0 notice
        user haproxy
        group haproxy
        maxconn 32000
        ulimit-n 262144

defaults
        log     global
        mode    http
        option  httplog
        option  dontlognull
        retries 3
        option redispatch
        maxconn 2000
        contimeout      5000
        clitimeout      50000
        srvtimeout      50000

....
I have these settings in /etc/sysctl.conf

net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_max_syn_backlog = 262144
net.core.somaxconn = 262144

I attach the sysctl.conf for completeness, I am sure it contains lots if
stupid config rows since it is very much copy'pasted, but I've tried to go
through each setting to understand what it affects.

About the swap, yeah the machine got out of memory due to that an
auto-restart script started to many java-processes.

Cheers and happy new year!

//Marcus





On Thu, Dec 31, 2009 at 3:15 PM, Willy Tarreau <w...@1wt.eu> wrote:

> Hi Marcus,
>
> On Wed, Dec 30, 2009 at 02:04:26PM +0100, Marcus Herou wrote:
> > Hi Willy, thanks for your answer it got filtered, that's why I missed it
> for
> > two weeks.
>
> No problem, it happens to me too from time to time.
>
> > Let's start with describing the service.
> >
> > We are hosting javascripts of the sizes up to 20K and serve flash and
> image
> > banners as well which of course are larger. That is basically it.. Ad
> > Serving.
> >
> > On the LB's we have about 2MByte/s per LB  = 2x2MByte/s = 4MByte/s
> ~30MBit/s
> > at peak, that is not the issue.
>
> OK so you're running at approx 100 hits/s per LB on average.
>
> > I've created a little script which parse the "active connections" from
> the
> > HAProxy stat interface and plots it into Cacti, it peaks at 100 (2x100)
> > connections per machine which is very little in your world I guess.
>
> It's not "little", I'd even say it's in the average, as most sites are
> running at very low rates.
>
> > I've attached a plot of tcp-connections as well. Nothing fancy there
> either
> > besides that the number of TIME_WAIT sockets are in the 10000 range (log
> > scale)
>
> 10000 TIME_WAIT with a default setting of 60 seconds means 160 sessions per
> second on average. That's still very reasonable and does not require
> particular tuning.
>
> > Here's the problem:
> >
> > Everyother day I receive alarms from Pingdom that the service is not
> > available and if I watch the syslog I get at about the same timings hints
> > about possible SYN flood. At the same timings we receive emails from
> sites
> > using us that our service is damn slow.
> >
> > What I feel is that we get "hickups" on the LB's somehow and that
> requests
> > get queued. If I count the number of rows in the access logs on the
> machines
> > behind the LB it decreases at the same timings and with the same factor
> on
> > each machine (perhaps 10-20%) leading me to think that the narrow point
> is
> > not on the backend side.
>
> this means to me that :
>  1) your SYN backlog is too short. It defaults to 128 packets per socket on
>     Linux (min of net.core.somaxconn and net.ipv4.tcp_max_syn_backlog). So
>     you need to increase them (around 10000 for both always gives me good
>     results.
>  2) you may be experiencing SYN flood attacks from time to time.
>  3) you have not enabled SYN cookies, which can protect against such issues
>     especially during SYN attacks. You can enable them with
> net.ipv4.tcp_syn_cookies.
>
> If you don't get any attack, #1 should be enough, but #3 is a good
> complement
> that acts once #1 is not enough anymore.
>
> It is also possible that you have not enabled enough connections in haproxy
> and that the port is saturated for a long time. But this will still be
> triggered
> by #1 above. This can be monitored on haproxy's stats page (limit and max
> on the
> frontend's sessions).
>
> > A little more about the backend servers:
> >
> > We have an ad publishing system which pushes data to the web-servers
> > enabling them to act almost 100% static, this have been the key thing
> which
> > I tuned some years ago. Initially every request went to a DB but now just
> a
> > simple Hashtable which is replicated from a "master".
> >
> > The backend servers have very little to do and consumes very little
> > resources:
> > Example:
> > top - 11:34:23 up 366 days,  1:15,  1 user,  load average: 0.37, 0.25,
> 0.23
> > Tasks:  79 total,   1 running,  78 sleeping,   0 stopped,   0 zombie
> > Cpu(s):  0.8%us,  0.5%sy,  0.0%ni, 94.0%id,  4.6%wa,  0.0%hi,  0.2%si,
> > 0.0%st
> > Mem:   4052904k total,  4008696k used,    44208k free,   292932k buffers
> > Swap:  3903784k total,     9240k used,  3894544k free,  2145340k cached
>
> You should be careful, this one has swapped at least once. It's very nasty
> to use swap on any web server, as this considerably increases response
> times.
>
> Regards,
> Willy
>
>


-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
marcus.he...@tailsweep.com
http://www.tailsweep.com/

Attachment: sysctl.conf
Description: Binary data

Reply via email to