Thanks Andrew for your input.
We use Solaris as well and the AOLserver seems to work fine in any other situations except when ns_mutex comes to play. Here is more details how we are using it.
We use ns_mutex inside a scheduled proc, which writes a cashed array of numbers (counters) to the database. This proc is scheduled for every 5 minutes, to lock that array - so that no other process can manipulate that array at the moment it's being written to db - writes the numbers to db, resets the counters, and then unlock that array using ns_mutex unlock.
Notice that this array is ns_share`ed. While everything seems to function and be happy, after the webserver gets more traffic, then we'll start seeing that all the process that have attempted to access that array, are waiting in the queue. At this stage the nsd process will take most of the CPU usage and the webserver almost doesn't respond the http requests. If we stop the traffic eventually (sometimes after a long time) the server will come back up to a normal operation and the queue will become empty.
I modified that scheduled proc only to not lock that array (no ns_mutex use), and after making this change, webserver never got in to trouble. That's why I'm almost certain that ns_mutex is causing problems.
I suspect maybe combination of ns_share and ns_mutex on that array might be the cause of this. I also noticed doing "upvar" on a ns_shared variable doesn't work !
Any more inputs regarding this matter will greatly be appreciated.
Thanks
Seena
-----Original Message-----
From: Andrew Piskorski
To: [EMAIL PROTECTED]
Sent: 1/23/03 7:11 PM
Subject: Re: [AOLSERVER] ns_mutex lock / unlock is likely causing our AOL webserver to hung
On Thu, Jan 23, 2003 at 07:23:28PM -0500, Seena wrote:
> After setting up a new server (AOLserver 3.3.1 w/ TCL 8), it seems
using
> the "ns_mutex" to luck array/list, while serve is running, bring our
site
> down. The same setup and code/application with AOLserver 2.3.3 w/ TCL
7,
> works fine. Any comment why/how this is happening ?
>
> I've heard we can use ns_rwlock instead of ns_mutex, would anyone
recommand
> replacing ns_mutex with ns_rwlock ?
I've used ns_mutex pretty heavily with AOLserver 3.3+ad13 and Tcl
8.3.2 on Solaris, and I've never had any problems. If your nsd
process is dieing, you must have something broken in your AOLserver,
although I've no idea what. Perhaps someone else here will, so you
should probably post a lot more details: Where you got your AOLserver
code, how you compiled it, what operating system, etc.
I've never used ns_rwlock, so I don't know abou that. What exactly
are you using ns_mutex for? Are you using ns_share? Perhaps you
could avoid having to use ns_mutex at all by using nsv? Or are you
doing something that you REALLY need to us ns_mutex for, like using
ns_cond, or making several separate nsv operations atomic?
Also, you said this problem "brings your site down", but in the
subject you said AOLserver is "hung"? What exactly is the failure
mode? Is your nsd process segfaulting? Or are you just deadlocking
threads such that AOLserver hangs there doing nothing?
--
Andrew Piskorski <[EMAIL PROTECTED]>
http://www.piskorski.com
