On Tue, Jun 29, 2010 at 03:23:38PM -0700, Sep Ng wrote:

> Basically we had aolservers running and while serving pages, it's also
> doing some heavy load processing from a ton of scheduled custom
> written procedures.

Scheduled using AOLserver built-in scheduler, ns_schedule_proc,
ns_schedule_daily, or the like?

> Aolserver crashes and segmentation faults are fairly frequent and
> the logs at the time pointed to these running threads as a probable
> cause.

Then the first place to look is in your custom code, it's the most
likely place for the bug.  Is your scheduled code purely Tcl or does
it use any C code?  If you turn off your scheduled procs, does the
crashing go away?

This is a debugging problem, you need to find the bug before you
decide how to fix it.  After the crash look at the core file's stack
trace in a debugger and see if that gives you any clues.  Can you
reproduce the problem by hitting your development AOLserver with a
particular load-testing script?  If the problem is non-obvious, you'll
probably need that to track it down.

Your focus on AOLserver's thread creation and scheduling mechanisms
seems misplaced.  You're speculating about ways to fix some imagined
problem, but you don't know yet whether your actual problem has any
similarity at all to your speculations.

> So basically, what I'm currently beating my head over is to
> build a much cleaner and better way of handling all the load

It's not clear that building any such thing will help you.  If the
crash-inducing bugs are in your custom scheduled code, it's fairly
likely that they're still going to crash no matter what thread you run
them in or how you go about scheduling those threads.

If after lots of looking you REALLY can't find the crash-causing
bug(s), THEN I'd start thinking about ways to live with and ameliorate
the problem.  The simplest one of course, which you've probably
already done, is to just let your AOLserver crash and make sure that
it's always able to come back up quickly and pick up as close to where
it left off as possible.

Better, is to isolate your custom scheduled code in an entirely
separate process, with communication between your AOLserver and that
helper process.  AOLserver 4.5 definitely includes a mechanism for
doing that, but I forget what it's called.  That way, your code may
well still crash, but it will only take down the helper process rather
than your entire AOLserver.

-- 
Andrew Piskorski <[email protected]>
http://www.piskorski.com/


--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to 
<[email protected]> with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
field of your email blank.

Reply via email to