On 2004.08.18, Janine Sisk <[EMAIL PROTECTED]> wrote:
> To refresh everyone's memory, I have a situation where a couple of
> older ACS sites on the same system, running nsd3.3+ad13, started
> hanging multiple times per day.  Most of the time by the time I receive
> notification from uptime, the problem has passed and leaves no clue
> behind.  Sometimes I do manage to catch it still hung, but I don't know
> gdb well enough to know what to do, so I just restart it.
>
> Based on suggestions here, I set minthreads to 10 and threadtimeout to
> 3600.  Maxthreads was already set to 10, so I left it alone.

What class of hardware are these sites running on?  maxthreads=10 is
pretty low, IMHO.  Running one site on a 1.2 GHz P3 running Linux, I'd
comfortably set minthreads=maxthreads=30.

> Since doing that there has been a significant reduction in the number
> of these incidents;  one or less per day per site instead of every few
> hours.  But less is not yet zero.
>
> I'm going to try experimenting with raising the above numbers some
> more, but anyone with a more educated guess than mine on what might be
> wrong would be welcome.

It could very well be that all 10 threads are being occupied for a few
seconds at a time.  For any reasonably active site, it's not hard to
believe that there's a page out there that takes a few seconds, and if
10 people hit that page at exactly the same time, the server will
"stall" until one of those 10 requests is completed.

> I'd also appreciate some simple steps I could
> take to try to figure out what's wrong next time I catch the server in
> a hung state;  I looked at the gdb doc Andrew posted a while back and
> although it would be a great reference if I knew what I was doing, it's
> not enough of a guide to tell me what I should be looking for.

gdb is probably premature at this point.  Do you have the control port
enabled on these servers?  I think the control port gets its own thread
regardless of maxthreads, so if the server is hung like this because
you're maxing out threads, you should still be able to log in via nscp
and issue "ns_server threads" and see what you see.  If you can time it
right and do it just as the site is hung, current should equal
max and idle should be zero -- that's how you know you've run out of
threads.

-- Dossy

--
Dossy Shiobara                       mail: [EMAIL PROTECTED]
Panoptic Computer Network             web: http://www.panoptic.com/
  "He realized the fastest way to change is to laugh at your own
    folly -- then you can let go and quickly move on." (p. 70)


--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of 
your email blank.

Reply via email to