Hey Nathan!
 
Here is the simplified version of the code which shows how we are using ns_mutex in our application. Basically the proc A, is being called a lot ( more than 100 times in a minute) across the applications, and proc B is scheduled to run every ~5 minutes. Here the primary reason for using ns_mutex is to protect counters' values while it's being manipulated ( incremented/written/cleared) from being accessed by other threads.
 
Please feel free to criticize this code as much as you can!
 
Again we are seeing that AOLserver 3.3.1 gets into trouble after calling this procs heavily (eventually the server goes down).  By only taking out the ns_mutex lines, we'll have no problem!. Previously we never had any problem running these on Version 2.3.3.
 
In the meanwhile regarding the ns_share, what is the major issue with it that people encourage not to use it ?
 
Thanks!
--Seena
 
#####################################
ns_share counter_A 
ns_share counter_B 
ns_share -init { set counter_mutex [ns_mutex create]counter_mutex  
 

proc X {i} {
 
 ns_share counter_A
 ns_share counter_B
 ns_share counter_mutex
 
 ns_mutex lock $counter_mutex
 
 incr counter_A($i) 1
 incr counter_B($i) 1
 
 ns_mutex unlock $counter_mutex

}
 

proc_doc Y {} {
 
 ns_share counter_A
 ns_share counter_B
 ns_share counter_mutex
 
 ns_mutex lock $counter_mutex
 
 foreach i_index [array names counter_A] {
 
  set temp_counter_A($i_index) $conter_A($i_index)
  set temp_counter_B($i_index) $conter_B($i_index)
  
  unset $conter_A($i_index)
  unset $conter_B($i_index)
  
 }
 
 ns_mutex unlock $counter_mutex
 
 ## writing $temp_counter_A and $temp_counter_B arrays to database
 
}
#####################################
 
-----Original Message-----
From: Nathan Folkman [mailto:[EMAIL PROTECTED]]
Sent: Friday, January 24, 2003 7:08 PM
To: [EMAIL PROTECTED]
Subject: Re: [AOLSERVER] ns_mutex lock / unlock is likely causing our AOL webserver to...

In a message dated 1/24/2003 4:47:20 PM Eastern Standard Time, [EMAIL PROTECTED] writes:

Any more inputs regarding this matter will greatly be appreciated.


Any chance you could provide a few snippets of code showing where you are locking and unlocking, and the work you are doing in between? Hard to tell what the problem is. If I had to guess, however, it sounds like you are dead locked. Perhaps you are locking, and throwing an un-caught error, and never unlocking? Or maybe you are just experiencing contention around your database which is causing other requests to back up waiting for that resource... If you can provide some more detailed information, including anythng odd you see in the server log that would be great! Also might want to check the SYSLOG for any database errors which could point to the problem.

Also, have you considered upgrading to at least AOLserver 3.4.2 or even better 3.5.1? Would need more information to know exactly what you are trying to do, but you might be able to use the nsv_incr command for your counters.

The nsv data structure is similiar to ns_share variables in that you can share variables between multiple threads/interps. The nsv implementation is a lot cleaner, and handles all the synchronization for you. Plus, as I mentioned before, there's a nifty nsv_incr command specifically for things like counters. ns_share is not recommended, especially when running Tcl 8.x.

- Nathan
--------------------------------------------------- 

Thanks Andrew for your input.

We use Solaris as well and the AOLserver seems to work fine in any other situations except when ns_mutex comes to play. Here is more details how we are using it.

We use ns_mutex inside a scheduled proc, which writes a cashed array of numbers (counters) to the database. This proc is scheduled for every 5 minutes, to lock that array - so that no other process can manipulate that array at the moment it's being written to db - writes the numbers to db, resets the counters, and then unlock that array using ns_mutex unlock.

Notice that this array is ns_share`ed. While everything seems to function and be happy, after the webserver gets more traffic, then we'll start seeing that all the process that have attempted to access that array, are waiting in the queue. At this stage the nsd process will take most of the CPU usage and the webserver almost doesn't respond the http requests. If we stop the traffic eventually (sometimes after a long time) the server will come back up to a normal operation and the queue will become empty.

I modified that scheduled proc only to not lock that array (no ns_mutex use), and after making this change, webserver never got in to trouble. That's why I'm almost certain that ns_mutex is causing problems.

I suspect maybe combination of ns_share and ns_mutex on that array might be the cause of this. I also noticed doing "upvar" on a ns_shared variable doesn't work !

Any more inputs regarding this matter will greatly be appreciated.

Thanks
Seena


-----Original Message-----
From: Andrew Piskorski
To: [EMAIL PROTECTED]
Sent: 1/23/03 7:11 PM
Subject: Re: [AOLSERVER] ns_mutex lock / unlock is likely causing our AOL webserver to hung

On Thu, Jan 23, 2003 at 07:23:28PM -0500, Seena wrote:

> After setting up a new server (AOLserver 3.3.1 w/ TCL 8), it seems
using
> the "ns_mutex" to luck array/list, while serve is running, bring our
site
> down. The same setup and code/application with AOLserver 2.3.3 w/ TCL
7,
> works fine. Any comment why/how this is happening ?
>
> I've heard we can use ns_rwlock instead of ns_mutex, would anyone
recommand
> replacing ns_mutex with ns_rwlock ?

I've used ns_mutex pretty heavily with AOLserver 3.3+ad13 and Tcl
8.3.2 on Solaris, and I've never had any problems.  If your nsd
process is dieing, you must have something broken in your AOLserver,
although I've no idea what.  Perhaps someone else here will, so you
should probably post a lot more details: Where you got your AOLserver
code, how you compiled it, what operating system, etc.

I've never used ns_rwlock, so I don't know abou that.  What exactly
are you using ns_mutex for?  Are you using ns_share?  Perhaps you
could avoid having to use ns_mutex at all by using nsv?  Or are you
doing something that you REALLY need to us ns_mutex for, like using
ns_cond, or making several separate nsv operations atomic?

Also, you said this problem "brings your site down", but in the
subject you said AOLserver is "hung"?  What exactly is the failure
mode?  Is your nsd process segfaulting?  Or are you just deadlocking
threads such that AOLserver hangs there doing nothing?

--
Andrew Piskorski <[EMAIL PROTECTED]>
http://www.piskorski.com

 

Reply via email to