Hi - a few suggestions/ideas:

1. How many requests per second do you average, and what are your
burst rates like?  What are your vmstat numbers like during average
and burst request processing, and how do they compare to when your
system is lightly loaded?  Getting familiar with these numbers will
help you decide if you have a system capacity problem that perhaps
only shows up during a traffic spike.

2. If increasing minthreads decreased the problems, it sounds like you
may have trouble starting new threads during the day.  Do you have a
huge TCL library?  This can significantly delay thread startup,
because a very large TCL string gets eval'd to create your library
procs.  If you have a couple of threads running and a robot decides to
spider your site, it may blast you with 10 requests.  Your server
needs to create 8 new threads, initialize them, handle these requests,
and queue all new incoming requests.  If a server is near its
capacity, a delay of even a few seconds can take a while to recover
from.  Ex: your max processing rate is 10 Req/sec, your average actual
request rate is 8 per second.  You stop handling requests for 5
seconds, so 40 requests queue up.  It'll take 20 seconds to clear
this backlog, because you are near capacity.

3. How big are the pages you are returning?  Do most of your pages fit
into a socket buffer (usually 32K bytes)?  If not, you'll tie up your
connection threads spooling http output and your server will appear
dead while incoming requests are queued.  This is the main issue that
prevents using a small maxthreads.  Like others, I'd suggest you bump
maxthreads up to 30 or so, but I'd also bump minthreads up to 30
because it sounds like you do have the thread startup problem I
mentioned in #2.

4. I'd run a time-stamped vmstat trace throughout the day to make sure
your system isn't just running out of steam.  On Linux, disk I/O
bottlenecks (specifically, flushing dirty buffers) can frequently
cause a system to appear dead for several seconds, and recovering from
this could take a while if you are already running near capacity.

5. If your system is truly idle when it is not responding to web
requests, ie, vmstat shows no I/O activity and no CPU activity, you
may have some kind of locking problem/bug.  I don't know anything
about ACS, so don't know if it extensively uses locks or not.

Good luck!
Jim


> To refresh everyone's memory, I have a situation where a couple of
> older ACS sites on the same system, running nsd3.3+ad13, started
> hanging multiple times per day.  Most of the time by the time I receive
> notification from uptime, the problem has passed and leaves no clue
> behind.  Sometimes I do manage to catch it still hung, but I don't know
> gdb well enough to know what to do, so I just restart it.
>
> Based on suggestions here, I set minthreads to 10 and threadtimeout to
> 3600.  Maxthreads was already set to 10, so I left it alone.
>
> Since doing that there has been a significant reduction in the number
> of these incidents;  one or less per day per site instead of every few
> hours.  But less is not yet zero.
>
> I'm going to try experimenting with raising the above numbers some
> more, but anyone with a more educated guess than mine on what might be
> wrong would be welcome.  I'd also appreciate some simple steps I could
> take to try to figure out what's wrong next time I catch the server in
> a hung state;  I looked at the gdb doc Andrew posted a while back and
> although it would be a great reference if I knew what I was doing, it's
> not enough of a guide to tell me what I should be looking for.
>
> thanks,
>
> janine
>
>
> --
> AOLserver - http://www.aolserver.com/
>
> To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> with 
> the
> body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field 
> of your email blank.
>


--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of 
your email blank.

Reply via email to