Title: RE: [AOLSERVER] ns_mutex is likely causing our AOL web server to hung - Memory problem

With 2.3.3 we use ACS and we use Oracle. Everything in the application seems to be working fine and we heavily tested all parts of the site, we don't see any Error or failure when the server starts acting strange. We fixed a few syntax changes which were not compatible with the new version, but if anything major needed to be changed, we should see some errors at least.

We sort of have our own version of ACS (we have added/modified it), given it's functioning with 3.3.1, is it possible to upgrade to 3.5.1 w/ TCL 8.4 ?


-----Original Message-----
From: Peter M. Jansson [mailto:[EMAIL PROTECTED]]
Sent: Thursday, January 30, 2003 9:38 PM
To: [EMAIL PROTECTED]
Subject: Re: [AOLSERVER] ns_mutex is likely causing our AOL web server
to hung - Memory problem


On Thursday, January 30, 2003, at 09:19 PM, Seena Kasmai wrote:

> Well, the strange thing is we never see such a behavior on 2.3.3 w/TCL 7.
> 0, and we run 4 web server with the same code/application. That's why I
> can't think of any code related issue.

It's been a long time since I've used 2.3.3, but I can't help but think
that there are some functions in 2.3.3 that are not compatible with 3.x,
so I don't think it's possible to pick up a 2.3.3 app (which was Tcl 7.6,
not Tcl 7.0) and run it directly on 3.x without some modifications.  (Well,
  no significant application, anyway.  OK, I'm sure there's a
counterexample out there somewhere.)

> I did check the size of the cache array we use for Memoizing stuff, and
> it's not that big at the time server is eating the memory. We were able
> to re-create the problem in 20 Minutes just by clicking on various pages
> (including TCL pages) and after we stop clicking the memory was kept
> getting eaten like 2-3MB per seconds and then it stops for a while and
> the starts again (while no activity), until it gets down to 16MB, and
> then it uses the max swap file allowed until it dies.

That memory is going somewhere.  Perhaps not into the memoize cache; I
only pointed out that one because you identified it in your message.  I
would start generously sprinkling ns_log statements through one of the
execution paths taken by one of the pages you've identified, including
filters and traces.  One possibility is that some function call you made
under 2.3.3 is now failing, and the application is retrying the operation,
  which could cause a lot of activity, since the retries will not fail.

Is there database activity going on?  Perhaps if you turn on verbose SQL
logging, you'll see a pattern of queries that could point you to the
problem.

> Anyhow, would you recommend to upgrade to 3.4.2 or 3.5.1 w/ TCL 8.3.1 ?

If you are using ACS and Oracle, or OpenACS, you must use a version of
AOLserver with arsDigita patches.  If you can upgrade, meaning that you
don't use any ACS stuff nor Oracle, then you want to use 3.5.1, and not 3.
4.2.  The 3.5.1 release will allow you to use Tcl 8.4, which is faster,
among other things, but the main thing is that with 3.5.1, if there's a
Tcl update, you can update Tcl without updating AOLserver.  So, if you do
not use ACS or OpenACS, nor Oracle, I suggest upgrading to AOLserver 3.5.1.

Again, given the pathological behavior you're reporting, I strongly doubt
the problem is something as subtle as a bug in Tcl.  I think such a bug
would not manifest itself so dramatically, unless it segfaulted
immediately.

Pete.

Reply via email to