Re: [AOLSERVER] ns_mutex is likely causing our AOL web server to hung - Memory problem

2003-01-31 Thread Seena Kasmai
Title: RE: [AOLSERVER] ns_mutex is likely causing our AOL web server to hung - Memory problem





You are right Andrew, we are using ACS and I believe the version is 2.2.3. Now the info tclversion says 8.3, but the info patchlevel says 8.3.2, also the directory is aolserver/lib/tcl8.3/, so not sure what is running right now. 

I've been digging into the application but since everything is happy and no Error is happening I have no idea what can cause this. We have a lot of tracing and logging in the critical sections and so forth but as I said nothing shows up when the webserevr starts eating all the memory. 

I haven't exactly found a pattern where I can create the problem, but basically if we start clicking on the pages for 10 minutes (load level ~2.5), then the problem shows up. But that doesn't tell anything because there might be a specific section that needs to be hit in order to create the memory problem. Now last night I tried to use some of our Admin pages which heavily touched data base and involves TCL usage a lot, the free memory dropped 30MB (which might be normal), and now after 12 hours or so, still is in the same usage, so I think it has something to do with the load and amount of traffic.

Would using -z (zippy memory allocator switch) help to do more tracing/monitoring ?


We use ns_share massively, could that be the cause ?


Thanks,
Seena


P.S as far as memory leak subject, so should I ignore the discussion I've found which I though it's similar to my problem ? Could you access the messages ? (the links I provided was broken I think, sorry about that)

Here is what Kris had said for the solution which seemed to work, and I ahev attached couple of emails that present the same issue.

---
On the subject of memory leaks, there is a known symptom of nsd8x
where it can grow without bound in certain circumstances. We do not
yet know the cause, but it appears to be endemic to Tcl 8.3.0. If you
use nsd76 the problem completely disappears.


Kris


-


The next release of AOLserver (which we'll be releasing very soon) has Tcl
8.3.1 which appears to have cleared up the memory leak. It does/will have a
range-checking memory allocator, too. If you have CVS access, you can use it
right now (as of 8/8/2000, in fact).


As far as an official comment, AOLserver is an open-source product.
Anyone with the means and the skill can help debug the server. I fail to
understand how a suggestion to move to nsd76 to solve an evident memory leak
in Tcl 8.3.0 equates to moving to IIS, as one writer on this mailing list
so eloquently put it.


Now, as for nsd76 growing without bound: that is news to AOL Digital City.
They run nsd76 in production on some of the busiest systems in the world and
we have yet to see a memory leak in the core AOLserver 3.0 (it's always been
in various C modules we load for our applications).


It's also important to understand the difference between RSS and SZ. The
RSS, or resident set size, is the amount of core memory being used by a
process. The SZ is the total amount of core memory plus virtual memory being
used. As any Unix administrator or developer can tell you, it is perfectly
normal and acceptable for a process to have a bigger SZ than RSS due to the
simple fact that not all data in a process' address space is used all the
time. This is very dependent on the flavor of Unix -- different systems have
different algorithms that decide when to write pages to swap. If you'd like
to read a fairly simple explanation of this, visit
http://www.freebsd.org/FAQ/misc.html, the book Operating System Concepts,
3e (Silberschatz/Peterson/Galvin), Unix Internals (Valhalia), and of
course the Tanenbaum book.


Finally, about Purify. We have access to the very latest versions of
Purify. Unfortunately, Purify dumps core when encountering such innocuous
messages as UMR. We are working on getting this issue resolved and using
Purify on Irix in the meantime, and haven't found much to suggest a problem
exists in nsd76 (though we deferred testing nsd8x until Tcl 8.3.1 is put
in).


I hope this message finds understanding readers.


Regards,


Kris


---




-Original Message-
From: Andrew Piskorski [mailto:[EMAIL PROTECTED]]
Sent: Friday, January 31, 2003 2:19 AM
To: [EMAIL PROTECTED]
Subject: Re: [AOLSERVER] ns_mutex is likely causing our AOL web server
to hung - Memory problem



On Thu, Jan 30, 2003 at 09:41:27PM -0500, Seena Kasmai wrote:
 With 2.3.3 we use ACS and we use Oracle. Everything in the application seems


 We sort of have our own version of ACS (we have added/modified it), given
 it's functioning with 3.3.1, is it possible to upgrade to 3.5.1 w/ TCL 8.4 ?


Seena, since your email address is @away.com, I figured you must be
using some flavor of ACS. But, exactly which version of the ACS was
your software based on originally? 3.4, 3.2, maybe even 2.x

Re: [AOLSERVER] ns_mutex is likely causing our AOL web server to hung - Memeory problem

2003-01-30 Thread Seena Kasmai
ndResponse /global/file-not-found.htmlns_param
ServerBusyResponse /global/server-busy.htmlns_param
ServerInternalErrorResponse /global/error.htmlns_param MaxThreads
40#MaxBusyThreads=20ns_param MaxBusyThreads 0ns_param MaxWait
15ns_param DirectoryListing nonens_param checkstats onns_param
checkStatsInterval 60ns_param Fancy On

ns_section ns/server/away/adp/parsersns_param "fancy"
".tcl"ns_param "fancy" ".adp"

ns_section ns/server/away/dbns_param Pools *ns_param DefaultPool
main

ns_section ns/server/away/cgins_param Map {GET /*.cgi}ns_param Map
{POST /*.cgi}

ns_section ns/server/away/adpns_param Map /*.adpns_param Map
/*.helpns_param Map /*.jsns_param Map /*.aspns_param Map
/*.htmns_param Map /*.htmlns_param "DefaultParser" "fancy"

ns_section ns/server/away/module/nslogns_param enablehostnamelookup
Offns_param file /home/away-logs/away-n5.logns_param maxbackup
10ns_param rollday *ns_param rollfmt %Y-%m-%d-%H.%Mns_param rollhour
0ns_param rollonsignal Onns_param rolllog Onns_param ExtendedHeaders
Referer,User-Agent,Host,Cookie

ns_section ns/server/away/module/nspermns_param model Smallns_param
enablehostnamelookup Off

ns_section ns/server/away/module/nssockns_param timeout 120ns_param
Address #ns_param Portns_param Hostname #Hostname=

ns_section ns/server/away/module/nssock_atbns_param timeout
120#ns_param port 8084ns_param Address ns_param Hostname 

ns_section ns/server/away/module/nsopensslns_param ServerAddress
#ns_param Port 8085ns_param ServerHostname ns_param CertFile
/home/nsadmin/servers/away/test-cert.pemns_param KeyFile
/home/nsadmin/servers/away/test-key.pemns_param SockServerCertFile
/home/nsadmin/servers/away/test-cert.pemns_param SockServerKeyFile
/home/nsadmin/servers/away/test-key.pemns_param
SockServerProtocols
"SSLv2, SSLv3, TLSv1"ns_param
SockServerCipherSuite
"ALL:!ADH:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv2:+EXP"ns_param
SockServerSessionCache
falsens_param
SockServerSessionCacheID 2ns_param
SockServerSessionCacheSize 512ns_param
SockServerSessionCacheTimeout 300ns_param
SockServerPeerVerify
truens_param SockServerPeerVerifyDepth
3ns_param
SockServerCADir
internal_cans_param
SockServerCAFile
internal_ca.pemns_param
SockServerTrace
false

ns_section ns/server/away/MimeTypesns_param Default
text/plainns_param NoExtension text/plainns_param .pcd
image/x-photo-cdns_param .prc application/x-pilotns_param .css
text/cssns_param .doc application/mswordns_param .rtf
application/mswordns_param .xls application/msexcelns_param .xlc
application/msexcelns_param .fm4 application/x-framemakerns_param .fm5
application/x-framemakerns_param .ppt
application/vnd.ms-powerpointns_param .pot
application/vnd.ms-powerpointns_param .pps
application/vnd.ms-powerpointns_param .dvi application/x-dvi

ns_section ns/server/away/nscachens_param cacheADP on

ns_section ns/server/away/module/nscache/adpns_param dostat
onns_param defaultexpires 3600ns_param maxsize 1

ns_section ns/server/away/modulesns_param nsperm nsperm.sons_param
nslog nslog.sons_param nssock nssock.sons_param nsopenssl
nsopenssl.sons_param nssock_atb nssock.so#nsftp=nsftp.sons_param
nscache nscache.so

ns_section ns/server/away/tclns_param Library /web/away/tcl

ns_section ns/serversns_param away away

ns_section ns/setupns_param Enabled Offns_param Port
9799ns_param Password t2o8WCGDYZddU

ns_section ns/threadsns_param systemscope on

ns_section ns/server/away/acsns_param PrimaryServerP 0ns_param
ClickTestServerP 1

ns_section ns/server/away/acs/cs/logging# is clickstreaming
on?ns_param EnabledP 0# work with old systems - do all session
management by ourselves?ns_param LegacyP 1# which pages are being served
and should be logged?# not used yet (or maybe ever)ns_param
PageExtensions {tcl, adp, html}# how many user sessions before a user is not
a new user?ns_param NewUserThreshold 10# logfile - ".%Y-%m-%d" is added
to thisns_param Logfile /home/click-logs/away-n2-cs.log# archive
templatens_param ArchiveFile
/home/nsadmin/log/away/old-cs-logs/away4-cs-log




  -Original Message-From: Seena Kasmai
  [mailto:[EMAIL PROTECTED]]Sent: Monday, January 27, 2003 6:08
  PMTo: [EMAIL PROTECTED]Subject: Re: [AOLSERVER]
  ns_mutex is likely causing our AOL web server to hung
  Nathan - If you look at the code it does lock before attempting to any
  manipulation to that array. 
  
  # ns_share counter_Ans_share
  counter_Bns_share -init { set counter_mutex [ns_mutex create] }
  counter_mutex proc X {i} { 
  ns_share counter_Ans_share counter_Bns_share
  counter_mutex 
  ns_mutex lock
  $counter_mutex incr counter_A($i) 1incr
  counter_B($i) 1ns_mutex unlock $counter_mutex
  }
  proc_doc Y {} { 
  ns_share
  counter_Ans_share counter_Bns_share counter_mutex 
  ns_mutex lock
  $counter_mutexforeach i_index [array names counter_A] {set
  temp_counter_A($i_index) $counter_A($i_index)set temp_counter_B($i_index)
  $counter

Re: [AOLSERVER] ns_mutex is likely causing our AOL web server to hung - Memory problem

2003-01-30 Thread Seena Kasmai



I 
found some old messages talking about "Memory Leak" inAOLserver 3 (I think I'm 
running to the same problem as far as memory and slowness issues we have right 
now).

According to answers, the source of the problem is TCL 8.0 and the 
solution is to upgrade TCL library to 8.3.1. http://listserv.aol.com/cgi-bin/wa?A2=ind0008L=aolserverD=0I=-3X=67CBE07276211DD16C[EMAIL PROTECTED]P=1878(and 
solution from Kriston : 
http://listserv.aol.com/cgi-bin/wa?A2=ind0008L=aolserverD=0I=-3X=0C65431B8EB0007184[EMAIL PROTECTED]P=3300)

Would 
some please be kind enough and assist me how to only upgrademy TCL to 
8.3.1 from my AOLserver/3.3.1+ad13 w/TCL 8.3 ??

Thank 
you,
Seena

  -Original Message-From: Seena Kasmai 
  [mailto:[EMAIL PROTECTED]]Sent: Thursday, January 30, 2003 11:09 
  AMTo: [EMAIL PROTECTED]Subject: Re: [AOLSERVER] 
  ns_mutex is likely causing our AOL web server to hung - Memeory 
  problem
  
  Hello 
  Again,
  
  Finally I put 
  exception handling (catch) afterthe ns_mutex lock, all across the 
  application to make sure we are unlocking the mutex. But again after running 
  some traffic to the web server, the requests to the page that actually calls 
  the ns_mutex, started to getting stuck and eventually server locked up. 
  
  
  Then I suspect that 
  maybe we are locking that mutex simultaneously (between the 2 procs) and 
  somehow it creates a conflict. So after removing the lock for the proc that 
  increment the array, I could never lock the server!! So it looks like we have 
  some sort of conflict when locking the same mutex, although I assume the locks 
  should go the a queue sort of thing and the unlocking should act in the order. 
  I wrote a test page to only lock a mutex (and not unlock). I run this page, 10 
  times, all of the requests get stuck in the queue, then I run a unlock mutex, 
  and every time I run, the first request in the queue gets releases, so the 
  functionality seems to be working but still don't know why in that case server 
  gets into trouble.
  
  Another issue that 
  might be related (or may be not), is that I have noticed, while the AOLServer 
  is running, the memory keeps getting shrink and eventually system runs out of 
  memory and web serve dies. Initially when AOLServer comes up, system has about 
  840MB memory. So far in about every 24-hour period, the memory becomes under 
  16MB and eventually server crashes (and memory gets back to 875MB). Here is a 
  snap shot of TOP when server starts up:
  
  CPU states: 
  100% idle, 0.0% user, 0.0% kernel, 0.0% iowait, 0.0% swap
  Memory: 1024M real, 829M free, 58M swap in use, 4809M swap 
  free
  
   PID USERNAME 
  THR PRI NICE SIZE RES STATE TIME CPU 
  COMMAND
  27834 nsadmin 8 59 0 52M 47M sleep 0:45 0.02% 
  nsd8x
  
  The only thing that 
  can use memory a lot while traffic is running on the site, is that our 
  application uses Memoize a lot, which caches the result of database queries in 
  a list of list format. but I saw the server was eating 1MB memory per second 
  (according to "top") even when nothing was going on the server !
  
  Again please not that 
  the same code/application and setup is running fine with AOL version 2.3.3 / 
  TCL 7, so I can't think of any nasty bug or a infinite loop that can be exist. 
  I've been closely looking at the error logs and there is no Error. Any comment 
  oridea that anyone may have to point out why the new version is acting 
  differently in this situation, is greatlyappreciated.
  
  BTW, 
  here is the configuration file : (should I have attached it !? 
  )
  
  ## Translated on Thu Jan 16 02:58:05 EST 2003# from .ini format 
  with translate-ini## config file for a Netra farm 
  box
  
  ns_section ns/db/poolsns_param main mainns_param subquery 
  subqueryns_param secondary secondaryparam secondary_subquery 
  secondary_subqueryns_param log logns_param clickstream 
  clickstreamns_param search search
  
  ns_section ns/db/drivers#ora8=ora8.2.0.1-816-.sons_param ora8 
  /home/nsadmin/bin/ora8.so
  
  ns_section ns/db/pool/mainns_param Driver ora8ns_param 
  Connections 6ns_param DataSource ora8_tcpns_param Userns_param 
  Passwordns_param Verbose Onns_param ExtendedTableInfo Onns_param 
  LogSQLErrors On
  
  ns_section ns/db/pool/subqueryns_param Driver ora8ns_param 
  Connections 6ns_param DataSource ora8_tcpns_param Userns_param 
  Passwordns_param Verbose Onns_param ExtendedTableInfo Onns_param 
  LogSQLErrors On
  
  ns_section ns/db/pool/secondaryns_param Driver ora8ns_param 
  Connections 6ns_param DataSource 
  testds#DataSource=ora8_tcpns_param User ns_param Password 
  ns_param Verbose onns_param ExtendedTableInfo Onns_param 
  LogSQLErrors On
  
  ns_section ns/db/pool/secondary_subqueryns_param Driver 
  ora8ns_param Connections 6ns_param DataSource 
  testds#DataSource=ora8_tcpns_param Userns_param 
  Passwordns_param Verbose onns_param ExtendedTableInfo Onns_param 
  LogSQLErrors On
  
  n

Re: [AOLSERVER] ns_mutex is likely causing our AOL web server to hung - Memory problem

2003-01-30 Thread Peter M. Jansson
On Thursday, January 30, 2003, at 07:58 PM, Seena Kasmai wrote:


Would some please be kind enough and assist me how to only upgrade my TCL
to 8.3.1 from my AOLserver/3.3.1+ad13 w/TCL 8.3 ??


For versions of AOLserver prior to 3.5, the Tcl implementation was tightly
tied to the AOLserver, and the only way to change the version of Tcl was
to use a different AOLserver version.  Given that you're using the
3.3.1+ad13 version of AOLserver, you're probably using OpenACS (or ACS
itself), and switching to AOLserver 3.5.2 is not possible.


Another issue that might be related (or may be not), is that I have
noticed, while the AOLServer is running, the memory keeps getting shrink
and eventually system runs out of memory and web serve dies. Initially
when AOLServer comes up, system has about 840MB memory. So far in about
every 24-hour period, the memory becomes under 16MB and eventually server
crashes (and memory gets back to 875MB). Here is a snap shot of TOP when
server starts up:


Seena, this behavior is not caused by a memory leak.  There is no leak
that serious in AOLserver.  Plenty of folks have had 3.3.1 systems that
take fair amounts of traffic and don't consume 800 MB of memory in 24
hours.  There is something in your application that is grabbing memory and
making it unavailable to the rest of the system.  Even though Tcl uses
garbage collection, Tcl can't GC memory that's being referenced (such as
in a Memoize cache).

Can you put some logging around your memoization to try to see what the
size of the memoize cache is?  Perhaps you could register a pre-auth trace
that captures the size of the memoize cache, and then register a trace
that computes the size again (after the request has run, because it's a
trace) and logs the difference?  If you could get a handle on whether one
request is particularly demanding on memory.

Even if you were able to update your Tcl, I think that, given the
magnitude of your memory issue, you would not see a meaningful improvement.

Pete.



Re: [AOLSERVER] ns_mutex is likely causing our AOL web server to hung - Memory problem

2003-01-30 Thread Seena Kasmai
Title: RE: [AOLSERVER] ns_mutex is likely causing our AOL web server to hung - Memory problem





Well, the strange thing is we never see such a behavior on 2.3.3 w/TCL 7.0, and we run 4 web server with the same code/application. That's why I can't think of any code related issue. 

I did check the size of the cache array we use for Memoizing stuff, and it's not that big at the time server is eating the memory. We were able to re-create the problem in 20 Minutes just by clicking on various pages (including TCL pages) and after we stop clicking the memory was kept getting eaten like 2-3MB per seconds and then it stops for a while and the starts again (while no activity), until it gets down to 16MB, and then it uses the max swap file allowed until it dies. 

Anyhow, would you recommend to upgrade to 3.4.2 or 3.5.1 w/ TCL 8.3.1 ?


Thanks Pete for your follow up,
Seena


-Original Message-
From: Peter M. Jansson [mailto:[EMAIL PROTECTED]]
Sent: Thursday, January 30, 2003 9:18 PM
To: [EMAIL PROTECTED]
Subject: Re: [AOLSERVER] ns_mutex is likely causing our AOL web server
to hung - Memory problem



On Thursday, January 30, 2003, at 07:58 PM, Seena Kasmai wrote:


 Would some please be kind enough and assist me how to only upgrade my TCL
 to 8.3.1 from my AOLserver/3.3.1+ad13 w/TCL 8.3 ??


For versions of AOLserver prior to 3.5, the Tcl implementation was tightly
tied to the AOLserver, and the only way to change the version of Tcl was
to use a different AOLserver version. Given that you're using the
3.3.1+ad13 version of AOLserver, you're probably using OpenACS (or ACS
itself), and switching to AOLserver 3.5.2 is not possible.


 Another issue that might be related (or may be not), is that I have
 noticed, while the AOLServer is running, the memory keeps getting shrink
 and eventually system runs out of memory and web serve dies. Initially
 when AOLServer comes up, system has about 840MB memory. So far in about
 every 24-hour period, the memory becomes under 16MB and eventually server
 crashes (and memory gets back to 875MB). Here is a snap shot of TOP when
 server starts up:


Seena, this behavior is not caused by a memory leak. There is no leak
that serious in AOLserver. Plenty of folks have had 3.3.1 systems that
take fair amounts of traffic and don't consume 800 MB of memory in 24
hours. There is something in your application that is grabbing memory and
making it unavailable to the rest of the system. Even though Tcl uses
garbage collection, Tcl can't GC memory that's being referenced (such as
in a Memoize cache).


Can you put some logging around your memoization to try to see what the
size of the memoize cache is? Perhaps you could register a pre-auth trace
that captures the size of the memoize cache, and then register a trace
that computes the size again (after the request has run, because it's a
trace) and logs the difference? If you could get a handle on whether one
request is particularly demanding on memory.


Even if you were able to update your Tcl, I think that, given the
magnitude of your memory issue, you would not see a meaningful improvement.


Pete.





Re: [AOLSERVER] ns_mutex is likely causing our AOL web server to hung - Memory problem

2003-01-30 Thread Nathan Folkman

In a message dated 1/30/03 9:27:18 PM, [EMAIL PROTECTED] writes:


Anyhow, would you recommend to upgrade to 3.4.2 or 3.5.1 w/ TCL 8.3.1 ?


3.5.x is Tcl 8.4.x only. I'd recommend upgrading to 3.5 if you're going to try and upgrade. It will put you in a good position to move to 4.0 once it gets released.

- Nathan


Re: [AOLSERVER] ns_mutex is likely causing our AOL web server to hung - Memory problem

2003-01-30 Thread Peter M. Jansson
On Thursday, January 30, 2003, at 09:19 PM, Seena Kasmai wrote:


Well, the strange thing is we never see such a behavior on 2.3.3 w/TCL 7.
0, and we run 4 web server with the same code/application. That's why I
can't think of any code related issue.


It's been a long time since I've used 2.3.3, but I can't help but think
that there are some functions in 2.3.3 that are not compatible with 3.x,
so I don't think it's possible to pick up a 2.3.3 app (which was Tcl 7.6,
not Tcl 7.0) and run it directly on 3.x without some modifications.  (Well,
 no significant application, anyway.  OK, I'm sure there's a
counterexample out there somewhere.)


I did check the size of the cache array we use for Memoizing stuff, and
it's not that big at the time server is eating the memory. We were able
to re-create the problem in 20 Minutes just by clicking on various pages
(including TCL pages) and after we stop clicking the memory was kept
getting eaten like 2-3MB per seconds and then it stops for a while and
the starts again (while no activity), until it gets down to 16MB, and
then it uses the max swap file allowed until it dies.


That memory is going somewhere.  Perhaps not into the memoize cache; I
only pointed out that one because you identified it in your message.  I
would start generously sprinkling ns_log statements through one of the
execution paths taken by one of the pages you've identified, including
filters and traces.  One possibility is that some function call you made
under 2.3.3 is now failing, and the application is retrying the operation,
 which could cause a lot of activity, since the retries will not fail.

Is there database activity going on?  Perhaps if you turn on verbose SQL
logging, you'll see a pattern of queries that could point you to the
problem.


Anyhow, would you recommend to upgrade to 3.4.2 or 3.5.1 w/ TCL 8.3.1 ?


If you are using ACS and Oracle, or OpenACS, you must use a version of
AOLserver with arsDigita patches.  If you can upgrade, meaning that you
don't use any ACS stuff nor Oracle, then you want to use 3.5.1, and not 3.
4.2.  The 3.5.1 release will allow you to use Tcl 8.4, which is faster,
among other things, but the main thing is that with 3.5.1, if there's a
Tcl update, you can update Tcl without updating AOLserver.  So, if you do
not use ACS or OpenACS, nor Oracle, I suggest upgrading to AOLserver 3.5.1.

Again, given the pathological behavior you're reporting, I strongly doubt
the problem is something as subtle as a bug in Tcl.  I think such a bug
would not manifest itself so dramatically, unless it segfaulted
immediately.

Pete.



Re: [AOLSERVER] ns_mutex is likely causing our AOL web server to hung - Memory problem

2003-01-30 Thread Seena Kasmai
Title: RE: [AOLSERVER] ns_mutex is likely causing our AOL web server to hung - Memory problem





With 2.3.3 we use ACS and we use Oracle. Everything in the application seems to be working fine and we heavily tested all parts of the site, we don't see any Error or failure when the server starts acting strange. We fixed a few syntax changes which were not compatible with the new version, but if anything major needed to be changed, we should see some errors at least.

We sort of have our own version of ACS (we have added/modified it), given it's functioning with 3.3.1, is it possible to upgrade to 3.5.1 w/ TCL 8.4 ?


-Original Message-
From: Peter M. Jansson [mailto:[EMAIL PROTECTED]]
Sent: Thursday, January 30, 2003 9:38 PM
To: [EMAIL PROTECTED]
Subject: Re: [AOLSERVER] ns_mutex is likely causing our AOL web server
to hung - Memory problem



On Thursday, January 30, 2003, at 09:19 PM, Seena Kasmai wrote:


 Well, the strange thing is we never see such a behavior on 2.3.3 w/TCL 7.
 0, and we run 4 web server with the same code/application. That's why I
 can't think of any code related issue.


It's been a long time since I've used 2.3.3, but I can't help but think
that there are some functions in 2.3.3 that are not compatible with 3.x,
so I don't think it's possible to pick up a 2.3.3 app (which was Tcl 7.6,
not Tcl 7.0) and run it directly on 3.x without some modifications. (Well,
 no significant application, anyway. OK, I'm sure there's a
counterexample out there somewhere.)


 I did check the size of the cache array we use for Memoizing stuff, and
 it's not that big at the time server is eating the memory. We were able
 to re-create the problem in 20 Minutes just by clicking on various pages
 (including TCL pages) and after we stop clicking the memory was kept
 getting eaten like 2-3MB per seconds and then it stops for a while and
 the starts again (while no activity), until it gets down to 16MB, and
 then it uses the max swap file allowed until it dies.


That memory is going somewhere. Perhaps not into the memoize cache; I
only pointed out that one because you identified it in your message. I
would start generously sprinkling ns_log statements through one of the
execution paths taken by one of the pages you've identified, including
filters and traces. One possibility is that some function call you made
under 2.3.3 is now failing, and the application is retrying the operation,
 which could cause a lot of activity, since the retries will not fail.


Is there database activity going on? Perhaps if you turn on verbose SQL
logging, you'll see a pattern of queries that could point you to the
problem.


 Anyhow, would you recommend to upgrade to 3.4.2 or 3.5.1 w/ TCL 8.3.1 ?


If you are using ACS and Oracle, or OpenACS, you must use a version of
AOLserver with arsDigita patches. If you can upgrade, meaning that you
don't use any ACS stuff nor Oracle, then you want to use 3.5.1, and not 3.
4.2. The 3.5.1 release will allow you to use Tcl 8.4, which is faster,
among other things, but the main thing is that with 3.5.1, if there's a
Tcl update, you can update Tcl without updating AOLserver. So, if you do
not use ACS or OpenACS, nor Oracle, I suggest upgrading to AOLserver 3.5.1.


Again, given the pathological behavior you're reporting, I strongly doubt
the problem is something as subtle as a bug in Tcl. I think such a bug
would not manifest itself so dramatically, unless it segfaulted
immediately.


Pete.





Re: [AOLSERVER] ns_mutex is likely causing our AOL web server to hung - Memory problem

2003-01-30 Thread Andrew Piskorski
On Thu, Jan 30, 2003 at 07:58:55PM -0500, Seena Kasmai wrote:

 Would some please be kind enough and assist me how to only upgrade my TCL to
 8.3.1 from my AOLserver/3.3.1+ad13 w/TCL 8.3 ??

3.3+ad13 ships with Tcl 8.3.2.  You can verify this.  If you compiled
from source, look for the directory aolserver/tcl8.3.2/.  More
conclusively, just display the results of running info tclversion
and info patchlevel in a Tcl page

--
Andrew Piskorski [EMAIL PROTECTED]
http://www.piskorski.com



Re: [AOLSERVER] ns_mutex is likely causing our AOL web server to hung - Memory problem

2003-01-30 Thread Andrew Piskorski
On Thu, Jan 30, 2003 at 09:41:27PM -0500, Seena Kasmai wrote:
 With 2.3.3 we use ACS and we use Oracle. Everything in the application seems

 We sort of have our own version of ACS (we have added/modified it), given
 it's functioning with 3.3.1, is it possible to upgrade to 3.5.1 w/ TCL 8.4 ?

Seena, since your email address is @away.com, I figured you must be
using some flavor of ACS.  But, exactly which version of the ACS was
your software based on originally?  3.4, 3.2, maybe even 2.x?  And
have you ever upgraded to or backported from newer ACS versions?

I don't recall when the internationalization stuff went into ACS.  The
safe bet is to to stick to the same versions of AOLserver that are ok
for OpenACS.  However, the fact that you were using AOLserver 2.3.3
until recently probably means that your ACS version is compatible with
ANY AOLserver 3.x version, as long as you have your Oracle driver and
any other loadable modules you need compiled for it.

The other people here are right though, there's no way what massive
memory usage problems you're seeing are do to an AOLserver or Tcl bug.
It's been a long time now, but I don't think any of the leak problems
fixed over time in 3.x were EVER that big, not even with 3.0 before
Rob Mayoff made any of his fixes at all.  Instead, sounds like
something in your application is tripping over some AOLserver 2.3
vs. 3.3 difference.

--
Andrew Piskorski [EMAIL PROTECTED]
http://www.piskorski.com



Re: [AOLSERVER] ns_mutex is likely causing our AOL web server to hung

2003-01-27 Thread Seena Kasmai



Hey
Nathan!

Hereis
the simplified version of the code which shows how we are using ns_mutex in our
application. Basically the proc A, is being called a lot( more than 100
times in a minute) across the applications, and proc B is scheduled to run every
~5 minutes. Here the primary reason for using ns_mutexis to protect
counters' valueswhile it's being manipulated (
incremented/written/cleared) from being accessed by other threads.


Please feel
free to criticize this code as much as you can! 

Again we are
seeing that AOLserver 3.3.1 gets into trouble after calling this
procsheavily (eventually the servergoes down). By
onlytaking out the ns_mutex lines, we'll have no problem!. Previously we
never had any problemrunning these on Version
2.3.3.

In the
meanwhile regarding thens_share, what is the major issue with it that
people encourage not to use it ?

Thanks!
--Seena

#

ns_share
counter_A ns_share counter_B ns_share -init { set
counter_mutex [ns_mutex create]
}counter_mutex

proc X {i} {

ns_share
counter_Ans_share counter_Bns_share counter_mutex


ns_mutex lock
$counter_mutex

incr
counter_A($i) 1incr counter_B($i)
1ns_mutex unlock
$counter_mutex
}

proc_doc Y {}
{

ns_share
counter_Ans_share counter_Bns_share counter_mutex


ns_mutex lock
$counter_mutexforeach i_index [array names
counter_A]{set temp_counter_A($i_index)
$conter_A($i_index)set temp_counter_B($i_index)
$conter_B($i_index)unset
$conter_A($i_index)unset
$conter_B($i_index)}

ns_mutex unlock
$counter_mutex## writing $temp_counter_Aand $temp_counter_B arrays to
database}

#



  -Original Message-From: Nathan Folkman
  [mailto:[EMAIL PROTECTED]]Sent: Friday, January 24, 2003 7:08
  PMTo: [EMAIL PROTECTED]Subject: Re: [AOLSERVER]
  ns_mutex lock / unlock is likely causing our AOL webserver
  to...In a message dated 1/24/2003 4:47:20 PM Eastern Standard Time,
  [EMAIL PROTECTED] writes:
  Any more inputs regarding this matter will
greatly be appreciated. 
  Any chance you could provide a few snippets of code showing
  where you are locking and unlocking, and the work you are doing in between?
  Hard to tell what the problem is. If I had to guess, however, it sounds like
  you are dead locked. Perhaps you are locking, and throwing an un-caught error,
  and never unlocking? Or maybe you are just experiencing contention around your
  database which is causing other requests to back up waiting for that
  resource... If you can provide some more detailed information, including
  anythng odd you see in the server log that would be great! Also might want to
  check the SYSLOG for any database errors which could point to the
  problem.Also, have you considered upgrading to at least AOLserver
  3.4.2 or even better 3.5.1? Would need more information to know exactly what
  you are trying to do, but you might be able to use the nsv_incr command for
  your counters. The nsv data structure is similiar to ns_share
  variables in that you can share variables between multiple threads/interps.
  The nsv implementation is a lot cleaner, and handles all the synchronization
  for you. Plus, as I mentioned before, there's a nifty nsv_incr command
  specifically for things like counters. ns_share is not recommended, especially
  when running Tcl 8.x.- Nathan
  ---
  
  Thanks Andrew for your input. 
  We use Solaris as well and the AOLserver seems to work fine in
  any other situations except when ns_mutex comes to play. Here is more details
  how we are using it.
  We use ns_mutex inside a scheduled proc, which writes a cashed
  array of numbers (counters) to the database. This proc is scheduled for every
  5 minutes, to lock that array - so that no other process can manipulate that
  array at the moment it's being written to db - writes the numbers to db,
  resets the counters, and then unlock that array using ns_mutex
  unlock.
  Notice that this array is ns_share`ed. While everything seems
  to function and be happy, after the webserver gets more traffic, then we'll
  start seeing that all the process that have attempted to access that array,
  are waiting in the queue. At this stage the nsd process will take most of the
  CPU usage and the webserver almost doesn't respond the http requests. If we
  stop the traffic eventually (sometimes after a long time) the server will come
  back up to a normal operation and the queue will become empty. 
  I modified that scheduled proc only to not lock that array (no
  ns_mutex use), and after making this change, webserver never got in to
  trouble. That's why I'm almost certain that ns_mutex is causing
  problems.
  I suspect maybe combination of ns_share and ns_mutex on that
  array might be the cause of this. I also noticed doing "upvar" on a ns_shared
  variable doesn't work !
  Any more inputs regarding this matter will greatly be
  appreciated. 
  Thanks Seena 
  -Original 

Re: [AOLSERVER] ns_mutex is likely causing our AOL web server to hung

2003-01-27 Thread Jim Wilcoxson
Put catches around your locked code and you may find a bug, for example,
incrementing an array var that doesn't exist or unsetting an array var
that doesn't exist.  Without ns_mutex calls, the code may blow up but
your server won't lock up.

Jim


 This message is in MIME format. Since your mail reader does not understand
 this format, some or all of this message may not be legible.

 --_=_NextPart_001_01C2C61E.F6AB3890
 Content-Type: text/plain

 Hey Nathan!

 Here is the simplified version of the code which shows how we are using
 ns_mutex in our application. Basically the proc A, is being called a lot (
 more than 100 times in a minute) across the applications, and proc B is
 scheduled to run every ~5 minutes. Here the primary reason for using
 ns_mutex is to protect counters' values while it's being manipulated (
 incremented/written/cleared) from being accessed by other threads.

 Please feel free to criticize this code as much as you can!

 Again we are seeing that AOLserver 3.3.1 gets into trouble after calling
 this procs heavily (eventually the server goes down).  By only taking out
 the ns_mutex lines, we'll have no problem!. Previously we never had any
 problem running these on Version 2.3.3.

 In the meanwhile regarding the ns_share, what is the major issue with it
 that people encourage not to use it ?

 Thanks!
 --Seena

 #
 ns_share counter_A
 ns_share counter_B
 ns_share -init { set counter_mutex [ns_mutex create] } counter_mutex


 proc X {i} {

  ns_share counter_A
  ns_share counter_B
  ns_share counter_mutex

  ns_mutex lock $counter_mutex

  incr counter_A($i) 1
  incr counter_B($i) 1

  ns_mutex unlock $counter_mutex

 }


 proc_doc Y {} {

  ns_share counter_A
  ns_share counter_B
  ns_share counter_mutex

  ns_mutex lock $counter_mutex

  foreach i_index [array names counter_A] {

   set temp_counter_A($i_index) $conter_A($i_index)
   set temp_counter_B($i_index) $conter_B($i_index)

   unset $conter_A($i_index)
   unset $conter_B($i_index)

  }

  ns_mutex unlock $counter_mutex

  ## writing $temp_counter_A and $temp_counter_B arrays to database

 }

 #


 -Original Message-
 From: Nathan Folkman [mailto:[EMAIL PROTECTED]]
 Sent: Friday, January 24, 2003 7:08 PM
 To: [EMAIL PROTECTED]
 Subject: Re: [AOLSERVER] ns_mutex lock / unlock is likely causing our AOL
 webserver to...


 In a message dated 1/24/2003 4:47:20 PM Eastern Standard Time,
 [EMAIL PROTECTED] writes:



 Any more inputs regarding this matter will greatly be appreciated.



 Any chance you could provide a few snippets of code showing where you are
 locking and unlocking, and the work you are doing in between? Hard to tell
 what the problem is. If I had to guess, however, it sounds like you are dead
 locked. Perhaps you are locking, and throwing an un-caught error, and never
 unlocking? Or maybe you are just experiencing contention around your
 database which is causing other requests to back up waiting for that
 resource... If you can provide some more detailed information, including
 anythng odd you see in the server log that would be great! Also might want
 to check the SYSLOG for any database errors which could point to the
 problem.

 Also, have you considered upgrading to at least AOLserver 3.4.2 or even
 better 3.5.1? Would need more information to know exactly what you are
 trying to do, but you might be able to use the nsv_incr command for your
 counters.

 The nsv data structure is similiar to ns_share variables in that you can
 share variables between multiple threads/interps. The nsv implementation is
 a lot cleaner, and handles all the synchronization for you. Plus, as I
 mentioned before, there's a nifty nsv_incr command specifically for things
 like counters. ns_share is not recommended, especially when running Tcl 8.x.

 - Nathan

 ---

 Thanks Andrew for your input.

 We use Solaris as well and the AOLserver seems to work fine in any other
 situations except when ns_mutex comes to play. Here is more details how we
 are using it.

 We use ns_mutex inside a scheduled proc, which writes a cashed array of
 numbers (counters) to the database. This proc is scheduled for every 5
 minutes, to lock that array - so that no other process can manipulate that
 array at the moment it's being written to db - writes the numbers to db,
 resets the counters, and then unlock that array using ns_mutex unlock.

 Notice that this array is ns_share`ed. While everything seems to function
 and be happy, after the webserver gets more traffic, then we'll start seeing
 that all the process that have attempted to access that array, are waiting
 in the queue. At this stage the nsd process will take most of the CPU usage
 and the webserver almost doesn't respond the http requests. If we stop the
 traffic eventually (sometimes after a long time) the server will come back
 up to a normal operation and 

Re: [AOLSERVER] ns_mutex is likely causing our AOL web server to hung

2003-01-27 Thread Rich Fredericks
In a message dated 1/27/2003 11:21:51 AM Eastern Standard Time, [EMAIL PROTECTED] writes:

proc_doc Y {} {
 
 ns_share counter_A
 ns_share counter_B
 ns_share counter_mutex 
 
 ns_mutex lock $counter_mutex
 
 foreach i_index [array names counter_A] {
 
 set temp_counter_A($i_index) $conter_A($i_index)
 set temp_counter_B($i_index) $conter_B($i_index)
 
 unset $conter_A($i_index)
 unset $conter_B($i_index)
 
 }


Is the above the actual snippet from the code? If so, my guess is the typos ($conter_A, $conter_B instead of $counter_A  $counter_B) are throwing errors and the mutex is not getting freed, causing deadlocking in your app. Removing the ns_mutex lines from the code wouldn't fix the errors, but the deadlocks would not occur. Are there any errors in your server log during the times the proc_doc Y runs?

~Rich
___
R  i  c  h F  r  e  d  e  r  i  c  k  s
Software Engineer
AOL Web Services  Publishing
p: 703.265.0364  
e: [EMAIL PROTECTED]


Re: [AOLSERVER] ns_mutex is likely causing our AOL web server to hung

2003-01-27 Thread Seena Kasmai
Title: RE: [AOLSERVER] ns_mutex is likely causing our AOL web server to hung





Good point, but there are logics before these lines (I have take out, the actual code is couple pages but is pre-processing stuff and error checking) that takes care of the errors. 

I don't think exception/error is the case, specially because the same code has been working for years in a very high traffic website. 

Sorry, it looks like I've had typos in the sample code, but you see the point anyway. Here is the code again:


# 
ns_share counter_A
ns_share counter_B
ns_share -init { set counter_mutex [ns_mutex create] } counter_mutex 

proc X {i} { 
ns_share counter_A
ns_share counter_B
ns_share counter_mutex 
ns_mutex lock $counter_mutex 
incr counter_A($i) 1
incr counter_B($i) 1

ns_mutex unlock $counter_mutex 

}
proc_doc Y {} { 
ns_share counter_A
ns_share counter_B
ns_share counter_mutex 
ns_mutex lock $counter_mutex

foreach i_index [array names counter_A] {

set temp_counter_A($i_index) $counter_A($i_index)
set temp_counter_B($i_index) $counter_B($i_index)

unset $counter_A($i_index)
unset $counter_B($i_index)

} 
ns_mutex unlock $counter_mutex

## writing $temp_counter_A and $temp_counter_B arrays to database

}
# 




-Original Message-
From: Jim Wilcoxson [mailto:[EMAIL PROTECTED]]
Sent: Monday, January 27, 2003 11:41 AM
To: [EMAIL PROTECTED]
Subject: Re: [AOLSERVER] ns_mutex is likely causing our AOL web server
to hung



Put catches around your locked code and you may find a bug, for example,
incrementing an array var that doesn't exist or unsetting an array var
that doesn't exist. Without ns_mutex calls, the code may blow up but
your server won't lock up.


Jim



 This message is in MIME format. Since your mail reader does not understand
 this format, some or all of this message may not be legible.

 --_=_NextPart_001_01C2C61E.F6AB3890
 Content-Type: text/plain

 Hey Nathan!

 Here is the simplified version of the code which shows how we are using
 ns_mutex in our application. Basically the proc A, is being called a lot (
 more than 100 times in a minute) across the applications, and proc B is
 scheduled to run every ~5 minutes. Here the primary reason for using
 ns_mutex is to protect counters' values while it's being manipulated (
 incremented/written/cleared) from being accessed by other threads.

 Please feel free to criticize this code as much as you can!

 Again we are seeing that AOLserver 3.3.1 gets into trouble after calling
 this procs heavily (eventually the server goes down). By only taking out
 the ns_mutex lines, we'll have no problem!. Previously we never had any
 problem running these on Version 2.3.3.

 In the meanwhile regarding the ns_share, what is the major issue with it
 that people encourage not to use it ?

 Thanks!
 --Seena

 #
 ns_share counter_A
 ns_share counter_B
 ns_share -init { set counter_mutex [ns_mutex create] } counter_mutex


 proc X {i} {

 ns_share counter_A
 ns_share counter_B
 ns_share counter_mutex

 ns_mutex lock $counter_mutex

 incr counter_A($i) 1
 incr counter_B($i) 1

 ns_mutex unlock $counter_mutex

 }


 proc_doc Y {} {

 ns_share counter_A
 ns_share counter_B
 ns_share counter_mutex

 ns_mutex lock $counter_mutex

 foreach i_index [array names counter_A] {

 set temp_counter_A($i_index) $conter_A($i_index)
 set temp_counter_B($i_index) $conter_B($i_index)

 unset $conter_A($i_index)
 unset $conter_B($i_index)

 }

 ns_mutex unlock $counter_mutex

 ## writing $temp_counter_A and $temp_counter_B arrays to database

 }

 #


 -Original Message-
 From: Nathan Folkman [mailto:[EMAIL PROTECTED]]
 Sent: Friday, January 24, 2003 7:08 PM
 To: [EMAIL PROTECTED]
 Subject: Re: [AOLSERVER] ns_mutex lock / unlock is likely causing our AOL
 webserver to...


 In a message dated 1/24/2003 4:47:20 PM Eastern Standard Time,
 [EMAIL PROTECTED] writes:



 Any more inputs regarding this matter will greatly be appreciated.



 Any chance you could provide a few snippets of code showing where you are
 locking and unlocking, and the work you are doing in between? Hard to tell
 what the problem is. If I had to guess, however, it sounds like you are dead
 locked. Perhaps you are locking, and throwing an un-caught error, and never
 unlocking? Or maybe you are just experiencing contention around your
 database which is causing other requests to back up waiting for that
 resource... If you can provide some more detailed information, including
 anythng odd you see in the server log that would be great! Also might want
 to check the SYSLOG for any database errors which could point to the
 problem.

 Also, have you considered upgrading to at least AOLserver 3.4.2 or even
 better 3.5.1? Would need more information to know exactly what you are
 trying to do, but you might be able to use the nsv_incr command for your
 counters

Re: [AOLSERVER] ns_mutex is likely causing our AOL web server to hung

2003-01-27 Thread Michael Richman
In a message dated 1/27/2003 10:43:28 AM Central Standard Time, [EMAIL PROTECTED] writes:

foreach i_index [array names counter_A] {

 set temp_counter_A($i_index) $conter_A($i_index)
 set temp_counter_B($i_index) $conter_B($i_index)
 
 unset $conter_A($i_index)
 unset $conter_B($i_index)
 
}


Is the above the actual snippet from the code? If so, my guess is the typos ($conter_A, $conter_B instead of $counter_A $counter_B) are throwing errors and the mutex is not getting freed, causing deadlocking in your app. Removing the ns_mutex lines from the code wouldn't fix the errors, but the deadlocks would not occur. Are there any errors in your server log during the times the proc_doc Y runs?

I'm guessing the code above is not the actual code, but in addition to Rich's comment, your "unset" lines should not have var substitution ($), but rather should just be the varname itself:

 unset conter_A($i_index)
 unset conter_B($i_index)
 ^-- should be no "$"

-- michael

__
michael richman 
princ software engineer
aol infrastructure dev
214.442.6048


Re: [AOLSERVER] ns_mutex is likely causing our AOL web server to hung

2003-01-27 Thread Seena Kasmai



sorry,
there is no $ sign in the actual code.

So, is
it worth trying to substitute ns_share with nvs stuff (nsv_set  nsv_get) to see if the problem goes away
?

Thanks,
Seena

  -Original Message-From: Michael Richman
  [mailto:[EMAIL PROTECTED]]Sent: Monday, January 27, 2003 11:50
  AMTo: [EMAIL PROTECTED]Subject: Re: [AOLSERVER]
  ns_mutex is likely causing our AOL web server to
  hungIn a message dated 1/27/2003 10:43:28 AM Central
  Standard Time, [EMAIL PROTECTED] writes:
  
foreach i_index
  [array names counter_A] {set temp_counter_A($i_index)
  $conter_A($i_index)set temp_counter_B($i_index)
  $conter_B($i_index)unset $conter_A($i_index)unset
  $conter_B($i_index)}Is the above the actual snippet from the code?
If so, my guess is the typos ($conter_A, $conter_B instead of $counter_A
$counter_B) are throwing errors and the mutex is not getting freed,
causing deadlocking in your app. Removing the ns_mutex lines from the
code wouldn't fix the errors, but the deadlocks would not occur. Are
there any errors in your server log during the times the proc_doc Y
  runs?I'm guessing the code above is not the actual code, but in
  addition to Rich's comment, your "unset" lines should not have var
  substitution ($), but rather should just be the varname
  itself:unset
  conter_A($i_index)unset
  conter_B($i_index) ^-- should be no
  "$"-- michael__michael richman
  princ software engineeraol
  infrastructure dev214.442.6048


Re: [AOLSERVER] ns_mutex is likely causing our AOL web server to hung

2003-01-27 Thread Andrew Piskorski
On Mon, Jan 27, 2003 at 12:25:32PM -0500, Seena Kasmai wrote:

 So, is it worth trying to substitute ns_share with nvs stuff (nsv_set 
 nsv_get) to see if the problem goes away ?

Yes!  With AOLserver 3.x or 4.x, you should always be using nsv
instead of ns_share if you can.

--
Andrew Piskorski [EMAIL PROTECTED]
http://www.piskorski.com



Re: [AOLSERVER] ns_mutex is likely causing our AOL web server to hung

2003-01-27 Thread Rob Mayoff
+-- On Jan 27, Seena Kasmai said:
 sorry, there is no $ sign in the actual code.

 So, is it worth trying to substitute ns_share with nvs stuff (nsv_set 
 nsv_get) to see if the problem goes away ?

Your most effective action, if you want to maximize the utility of
the advice from this mailing list, would be to create a test case
that reproduces the problem, and post it in its entirety. Posting a
simpified version of your problematic production code is not very
helpful, because the simplified version is likely to omit whatever is
causing the problem.

That said, I doubt that using nsv_* instead of ns_share will help.
Given your description of the problem, the most likely cause is that
an error is occurring in a critical section (a section where the mutex
is locked), preventing the Tcl interpreter from reaching the ns_mutex
unlock command. You have not yet proved to us that this is not the
case.

So the next logical step (other than creating a test case) is to test
the hypothesis that such is the case, by putting catch commands around
your critical sections. For example, suppose the critical section looks
like this:

ns_mutex lock L
SCRIPT
ns_mutex unlock L

Then you should change that to this:

ns_mutex lock L
set code [catch {
SCRIPT
} result]
ns_mutex unlock L
if {$code != 0} {
return -code $code -errorinfo $::errorInfo \
-errorcode $::errorCode $result
}

(You'll need to use different variable names if you already have
variables named code and result.) You can see that this guarantees
that L will be unlocked, no matter what happens when SCRIPT is executed.

Another approach would be to create a procedure like this:

proc ns_mutex_eval {lock script} {

ns_mutex lock $lock
set code [catch {uplevel 1 $script} result]
ns_mutex unlock $lock

return -code $code -errorinfo $::errorInfo \
-errorcode $::errorCode $result
}

Then you would change the example critical section above to this:

ns_mutex_eval L {
SCRIPT
}

This way you don't have to worry about reusing the variable names code
and result, and you don't have to repeat as much code at each critical
section.



Re: [AOLSERVER] ns_mutex is likely causing our AOL web server to hung

2003-01-27 Thread Nathan Folkman
In a message dated 1/27/2003 12:32:51 PM Eastern Standard Time, [EMAIL PROTECTED] writes:

sorry, there is no $ sign in the actual code.
 
So, is it worth trying to substitute ns_share with nvs stuff (nsv_set nsv_get) to see if the problem goes away ?
 
Thanks,
 Seena


I would definitly recommend using nsv's instead of ns_share variables, especially if you are running Tcl 8.x. For your application, you'll probably want to take a look at the nsv_incr command specifically.

Here's another tip which can help when dealing with lock contention. First, make sure you are creating named mutex locks:

ns_mutex create counter

Second, enable mutex metering in your AOLserver configuration. Be aware that this causes some additional lock contention itself, so I'd recommend only enabling this in a development environment:

ns_section "ns/threads"
ns_param mutexmeter on

Lastly, with mutex metering enabled, you can use the "ns_info" command from the control port or an .adp page to find out what locks are causing the most contention. The latest AOLserver 3.5.x release contains a web based stats interface that displays this information. Here's a little script which essentially does the same thing:

set results "NAME(ID): #LOCK, #BUSY, CONTENTION\n"

foreach lock [ns_info locks] {
 foreach {name owner id nlock nbusy} $lock {
 if {$nbusy == 0} {
 set contention 0.0 
 } else { 
 set contention [expr double($nbusy*100.0/$nlock)] 
 } 
 } 
 append results "${name}(${id}): ${nlock}, ${nbusy}, ${contention}%\n"
}

return $results

Hope this helps!

- Nathan


Re: [AOLSERVER] ns_mutex is likely causing our AOL web server to hung

2003-01-27 Thread Seena Kasmai
Title: RE: [AOLSERVER] ns_mutex is likely causing our AOL web server to hung





The error catching concept is definitely wise. In fact I'll go ahead and put those in. This code is sort of legacy and old but it's definitely worth revising it.

The reason I don't see this might be the source of the issue, is that the same thing works (and is been working) with the older version of AOLserver which we have been using for years. Although there might be another hole or a different config that is causing the ns_muext to show up as the problem.

Regarding the error handling in this code, as you see, the only thing is between the lock/unlock block is just incrementing the arrays, and also the database action takes places after unlocking. Since the existence of the arrays also is being tested and takes place before attempting to use ns_mutex, I'm assuming that no error could cause the ns_mutex unlock to be skipped because of an exception, plus nothing shows up in the error log either.

These being said, still I'll try to put a catch block anywhere between the ns_mutex lock/unlock, blocks in the code.


I'd also like to try Nathan's mutexmeter solution to see if I find anything new.


Thanks for the advices,
Seena


-Original Message-
From: Andrew Piskorski [mailto:[EMAIL PROTECTED]]
Sent: Monday, January 27, 2003 2:06 PM
To: [EMAIL PROTECTED]
Subject: Re: [AOLSERVER] ns_mutex is likely causing our AOL web server
to hung



On Mon, Jan 27, 2003 at 12:23:50PM -0600, Rob Mayoff wrote:


 So the next logical step (other than creating a test case) is to test
 the hypothesis that such is the case, by putting catch commands around
 your critical sections. For example, suppose the critical section looks
 like this:

 ns_mutex lock L
 SCRIPT
 ns_mutex unlock L

 Then you should change that to this:

 ns_mutex lock L
 set code [catch {
 SCRIPT
 } result]
 ns_mutex unlock L


This is good advice, and not just for debugging! When I run ANY with
a mutex locked that could ever possibly error out, I always wrap it in
a catch to properly clean up the mutex on error. E.g.:


ns_mutex lock $data_mutex
if { [catch {
 error Foo!
} errmsg] } {
 # We caught an unexpected error while the mutex was locked, so
 # unlock the mutex, then re-throw the error:
 ns_mutex unlock $data_mutex
 global errorInfo
 set my_error $errorInfo
 error $my_error
}
ns_mutex unlock $data_mutex


--
Andrew Piskorski [EMAIL PROTECTED]
http://www.piskorski.com





Re: [AOLSERVER] ns_mutex is likely causing our AOL web server to hung

2003-01-27 Thread Nathan Folkman
In a message dated 1/27/2003 2:22:09 PM Eastern Standard Time, [EMAIL PROTECTED] writes:

Regarding the error handling in this code, as you see, the only thing is between the lock/unlock block is just incrementing the arrays, and also the database action takes places after unlocking. Since the existence of the arrays also is being tested and takes place before attempting to use ns_mutex, I'm assuming that no error could cause the ns_mutex unlock to be skipped because of an exception, plus nothing shows up in the error log either.

careful - you might have a race condition. consider this scenerio:

THREAD 1:
- check for existance of array(key)
- lock
- do something with array(key)
- unlock

THREAD 2:
- unset array(key)

thread 2 could unset your array after you've checked for its existance, and before you did something with it. to fix the scenerio above you'd need to lock around all access to your array and move the check for existance inside the lock as well:

THREAD 1:
- lock
- check for existance of array(key)
- do something with array(key)
- unlock

THREAD 2:
- lock
- unset array(key)
- unlock

better still is to catch and handle errors around code which acquires a mutex lock. this allows you to properly unlock and prevents dead lock situations where you've acquired a lock, an error occurs, and you never release the lock.

one other note about the nsv_incr command. in versions prior to 4.0 you need to first initialize the the nsv array and variable you are incrementing:

nsv_set myArray counter 0
nsv_incr myArray counter

in 4.0 the nsv_incr will create and initialize the array and variable if it doesn't already exist:

nsv_incr myArray counter

- nathan




Re: [AOLSERVER] ns_mutex is likely causing our AOL web server to hung

2003-01-27 Thread Seena Kasmai



Nathan
- If you look at the code it does lock before attempting to any manipulation to
that array. 

# ns_share counter_Ans_share counter_Bns_share -init
{ set counter_mutex [ns_mutex create] }
counter_mutex proc X {i} { 
ns_share counter_Ans_share counter_Bns_share
counter_mutex 
ns_mutex lock
$counter_mutex incr counter_A($i) 1incr
counter_B($i) 1ns_mutex unlock $counter_mutex
}
proc_doc
Y {} { 
ns_share
counter_Ans_share counter_Bns_share counter_mutex 
ns_mutex lock
$counter_mutexforeach i_index [array names counter_A] {set
temp_counter_A($i_index) $counter_A($i_index)set temp_counter_B($i_index)
$counter_B($i_index)unset counter_A($i_index)unset
counter_B($i_index)}
ns_mutex unlock
$counter_mutex## writing $temp_counter_A and $temp_counter_B arrays to
database} # 

  -Original Message-From: Nathan Folkman
  [mailto:[EMAIL PROTECTED]]Sent: Monday, January 27, 2003 2:40
  PMTo: [EMAIL PROTECTED]Subject: Re: [AOLSERVER]
  ns_mutex is likely causing our AOL web server to
  hungIn a message dated 1/27/2003 2:22:09 PM Eastern
  Standard Time, [EMAIL PROTECTED] writes:
  Regarding the error handling in this code, as you see, the only
thing is between the lock/unlock block is just incrementing the arrays, and
also the database action takes places after unlocking. Since the existence
of the arrays also is being tested and takes place before attempting to use
ns_mutex, I'm assuming that no error could cause the ns_mutex unlock to be
skipped because of an exception, plus nothing shows up in the error log
either.careful - you might have a race condition.
  consider this scenerio:THREAD 1:- check for existance of
  array(key)- lock- do something with array(key)-
  unlockTHREAD 2:- unset array(key)thread 2 could unset your
  array after you've checked for its existance, and before you did something
  with it. to fix the scenerio above you'd need to lock around all access to
  your array and move the check for existance inside the lock as
  well:THREAD 1:- lock- check for existance of array(key)-
  do something with array(key)- unlockTHREAD 2:- lock- unset
  array(key)- unlockbetter still is to catch and handle errors
  around code which acquires a mutex lock. this allows you to properly unlock
  and prevents dead lock situations where you've acquired a lock, an error
  occurs, and you never release the lock.one other note about the
  nsv_incr command. in versions prior to 4.0 you need to first initialize the
  the nsv array and variable you are incrementing:nsv_set myArray
  counter 0nsv_incr myArray counterin 4.0 the nsv_incr will create
  and initialize the array and variable if it doesn't already
  exist:nsv_incr myArray counter-
nathan


Re: [AOLSERVER] ns_mutex is likely causing our AOL web server to hung

2003-01-27 Thread Nathan Folkman
In a message dated 1/27/2003 6:15:50 PM Eastern Standard Time, [EMAIL PROTECTED] writes:

Nathan - If you look at the code it does lock before attempting to any manipulation to that array. 

Just making sure. ;-) Any luck with the nsv_incr approach or any more data from a server running with mutex metering on?

- Nathan