You are right Andrew, we are using ACS and I believe the version is 2.2.3. Now the info tclversion says 8.3, but the info patchlevel says 8.3.2, also the directory is "aolserver/lib/tcl8.3/", so not sure what is running right now.
I've been digging into the application but since everything is happy and no Error is happening I have no idea what can cause this. We have a lot of tracing and logging in the critical sections and so forth but as I said nothing shows up when the webserevr starts eating all the memory.
I haven't exactly found a pattern where I can create the problem, but basically if we start clicking on the pages for 10 minutes (load level ~2.5), then the problem shows up. But that doesn't tell anything because there might be a specific section that needs to be hit in order to create the memory problem. Now last night I tried to use some of our Admin pages which heavily touched data base and involves TCL usage a lot, the free memory dropped 30MB (which might be normal), and now after 12 hours or so, still is in the same usage, so I think it has something to do with the load and amount of traffic.
Would using -z (zippy memory allocator switch) help to do more tracing/monitoring ?
We use ns_share massively, could that be the cause ?
Thanks,
Seena
P.S as far as memory leak subject, so should I ignore the discussion I've found which I though it's similar to my problem ? Could you access the messages ? (the links I provided was broken I think, sorry about that)
Here is what Kris had said for the solution which seemed to work, and I ahev attached couple of emails that present the same issue.
-----------------------------------
On the subject of memory leaks, there is a known symptom of nsd8x
where it can grow without bound in certain circumstances. We do not
yet know the cause, but it appears to be endemic to Tcl 8.3.0. If you
use nsd76 the problem completely disappears.
Kris
-------------------------------------
The next release of AOLserver (which we'll be releasing very soon) has Tcl
8.3.1 which appears to have cleared up the memory leak. It does/will have a
range-checking memory allocator, too. If you have CVS access, you can use it
right now (as of 8/8/2000, in fact).
As far as an "official comment", AOLserver is an open-source product.
Anyone with the means and the skill can help debug the server. I fail to
understand how a suggestion to move to nsd76 to solve an evident memory leak
in Tcl 8.3.0 equates to "moving to IIS", as one writer on this mailing list
so eloquently put it.
Now, as for nsd76 growing without bound: that is news to AOL Digital City.
They run nsd76 in production on some of the busiest systems in the world and
we have yet to see a memory leak in the core AOLserver 3.0 (it's always been
in various C modules we load for our applications).
It's also important to understand the difference between RSS and SZ. The
RSS, or "resident set size", is the amount of core memory being used by a
process. The SZ is the total amount of core memory plus virtual memory being
used. As any Unix administrator or developer can tell you, it is perfectly
normal and acceptable for a process to have a bigger SZ than RSS due to the
simple fact that not all data in a process' address space is used all the
time. This is very dependent on the flavor of Unix -- different systems have
different algorithms that decide when to write pages to swap. If you'd like
to read a fairly simple explanation of this, visit
http://www.freebsd.org/FAQ/misc.html, the book "Operating System Concepts,
3e" (Silberschatz/Peterson/Galvin), "Unix Internals" (Valhalia), and of
course the Tanenbaum book.
Finally, about Purify. We have access to the very latest versions of
Purify. Unfortunately, Purify dumps core when encountering such innocuous
messages as "UMR." We are working on getting this issue resolved and using
Purify on Irix in the meantime, and haven't found much to suggest a problem
exists in nsd76 (though we deferred testing nsd8x until Tcl 8.3.1 is put
in).
I hope this message finds understanding readers.
Regards,
Kris
---------------------------------------------------------------
-----Original Message-----
From: Andrew Piskorski [mailto:[EMAIL PROTECTED]]
Sent: Friday, January 31, 2003 2:19 AM
To: [EMAIL PROTECTED]
Subject: Re: [AOLSERVER] ns_mutex is likely causing our AOL web server
to hung - Memory problem
On Thu, Jan 30, 2003 at 09:41:27PM -0500, Seena Kasmai wrote:
> With 2.3.3 we use ACS and we use Oracle. Everything in the application seems
> We sort of have our own version of ACS (we have added/modified it), given
> it's functioning with 3.3.1, is it possible to upgrade to 3.5.1 w/ TCL 8.4 ?
Seena, since your email address is @away.com, I figured you must be
using some flavor of ACS. But, exactly which version of the ACS was
your software based on originally? 3.4, 3.2, maybe even 2.x? And
have you ever upgraded to or backported from newer ACS versions?
I don't recall when the internationalization stuff went into ACS. The
safe bet is to to stick to the same versions of AOLserver that are ok
for OpenACS. However, the fact that you were using AOLserver 2.3.3
until recently probably means that your ACS version is compatible with
ANY AOLserver 3.x version, as long as you have your Oracle driver and
any other loadable modules you need compiled for it.
The other people here are right though, there's no way what massive
memory usage problems you're seeing are do to an AOLserver or Tcl bug.
It's been a long time now, but I don't think any of the leak problems
fixed over time in 3.x were EVER that big, not even with 3.0 before
Rob Mayoff made any of his fixes at all. Instead, sounds like
something in your application is tripping over some AOLserver 2.3
vs. 3.3 difference.
--
Andrew Piskorski <[EMAIL PROTECTED]>
http://www.piskorski.com
Someone has mentioned a memory leak in nsd8x. I am running on Linux RH6.1 and restart my AOLserver3.0 almost daily because the executable size grows so large. Before I activated more swap today I had:
RAM: 212M
Swap: 210M (now 350M)
nsd8x grows from SIZE=32M RSS=17M shortly after restart to SIZE=150M
RSS=100M, or more. Occasionally it grows enough to leave less than 1M of
swap, at which point the server doesn't really work anymore. The speed
at which this happens seems variable, I can't really tell if it is
related to the amount of traffic.
I have considered using another AOLserver to monitor and restart a stuck
server, but I think both will stop working togeather. A cron job of some
type might be more robust.
When I recently tried out nsd76, I had many crashes, but I think that
was due to not using the -i flag (or using it from the console, I can't
remember which).
Any ideas?
--Tom Jackson
-------------------------------------
Interesting .. I was just about to post this exact same question! On my
production Solaris 7 box, nsd8x slowly grows until the memory gets so tight
my Perl scripts won't spawn! The customer is very bitchy about letting me
restart the thing on a schedule .. I start hearing moaning about 'We never
had to do this when we were on NES' ... <sigh>
-------------------------------------
Nicholas Irving wrote:
> I had the same problem with a PERL Script that I had written a couple of years
> ago, it would always bring the server down within 24-48 hours of deployment
> but I attributed the server failing because PERL was leaky in the old memory
> department and not AOLServer. I fixed the problem by moving the script from
> PERL to TCL and the server performed a lot better, what I think may need to
> looked into is a port of FastCGI that allows PERL Scripts to be compiled once
> and executed many time, thus saving on the amount of Perl Interps. needed to
> be run. Somebody did develop this for AOLServer a couple of years ago but it
> may have been an internal project.
Just to add a little more information. I started looking through the c
code, and thought that maybe I had discovered a cause. With tcl 8.x the
default cache memory (for ADPs) is private and defaults to 5MB per
interp. With sometimes 20 interps, I thought this might eventually add
up to 100MB. I switched off all cacheing, and AOLserver still allocated
5M, but didn't use much. The memory used still grew as usual over about
36 hours to about SIZE=140, RSS=80MB. I switched to nsd76, but as usual,
it crashed within 6 hours down to the parent process plus one child.
I am running OpenACS 3.2.2b, so a few perl scripts do run on occasion,
but I have another 3.2.2 installation running that gets no traffic and
never seems to have memory problems. It runs the same set of perl
scripts, on the same schedule. I have 2 other AOLserver3.0 servers
running on the same machine, both getting minimal traffic, also with no
problems.
>Date: Wed, 7 Jun 2000 19:20:42 -0600
>From: Roberto Mello <[EMAIL PROTECTED]>
>Subject: Memory Leak on nsd8x ?
>
> Hi,
>
> I've been experiencing some weird behaviour with nsd8x... it
> gradually
>grabs a bigger and bigger chunk of my RAM (on site with barely any hits
>~= 100/day).
> I am running OpenACS, so this could be something in it instead of
>nsd8x, but I doubt it because I don't see this behaviour when running
>nsd76. I was just wondering if anybody else has seen this.
> I am running it on a vanilla Debian GNU/Linux (Potato) box with
> kernel
>2.2.14.
>
> -Roberto Mello
I didn't use any sophisticated method of diagnose. I simply used good
old "top" and interrogated for other's experiences.
nsd8x's %MEM keeps getting bigger and bigger. About three hours ago
nsd's %MEM was 28%. I restarted it and it starts at 5.8%. Now it is
showing 13.7% of MEM usage.
-Roberto Mello
