Put catches around your locked code and you may find a bug, for example,
incrementing an array var that doesn't exist or unsetting an array var
that doesn't exist. Without ns_mutex calls, the code may blow up but
your server won't lock up.
Jim
>
> This message is in MIME format. Since your mail reader does not understand
> this format, some or all of this message may not be legible.
>
> ------_=_NextPart_001_01C2C61E.F6AB3890
> Content-Type: text/plain
>
> Hey Nathan!
>
> Here is the simplified version of the code which shows how we are using
> ns_mutex in our application. Basically the proc A, is being called a lot (
> more than 100 times in a minute) across the applications, and proc B is
> scheduled to run every ~5 minutes. Here the primary reason for using
> ns_mutex is to protect counters' values while it's being manipulated (
> incremented/written/cleared) from being accessed by other threads.
>
> Please feel free to criticize this code as much as you can!
>
> Again we are seeing that AOLserver 3.3.1 gets into trouble after calling
> this procs heavily (eventually the server goes down). By only taking out
> the ns_mutex lines, we'll have no problem!. Previously we never had any
> problem running these on Version 2.3.3.
>
> In the meanwhile regarding the ns_share, what is the major issue with it
> that people encourage not to use it ?
>
> Thanks!
> --Seena
>
> #####################################
> ns_share counter_A
> ns_share counter_B
> ns_share -init { set counter_mutex [ns_mutex create] } counter_mutex
>
>
> proc X {i} {
>
> ns_share counter_A
> ns_share counter_B
> ns_share counter_mutex
>
> ns_mutex lock $counter_mutex
>
> incr counter_A($i) 1
> incr counter_B($i) 1
>
> ns_mutex unlock $counter_mutex
>
> }
>
>
> proc_doc Y {} {
>
> ns_share counter_A
> ns_share counter_B
> ns_share counter_mutex
>
> ns_mutex lock $counter_mutex
>
> foreach i_index [array names counter_A] {
>
> set temp_counter_A($i_index) $conter_A($i_index)
> set temp_counter_B($i_index) $conter_B($i_index)
>
> unset $conter_A($i_index)
> unset $conter_B($i_index)
>
> }
>
> ns_mutex unlock $counter_mutex
>
> ## writing $temp_counter_A and $temp_counter_B arrays to database
>
> }
>
> #####################################
>
>
> -----Original Message-----
> From: Nathan Folkman [mailto:[EMAIL PROTECTED]]
> Sent: Friday, January 24, 2003 7:08 PM
> To: [EMAIL PROTECTED]
> Subject: Re: [AOLSERVER] ns_mutex lock / unlock is likely causing our AOL
> webserver to...
>
>
> In a message dated 1/24/2003 4:47:20 PM Eastern Standard Time,
> [EMAIL PROTECTED] writes:
>
>
>
> Any more inputs regarding this matter will greatly be appreciated.
>
>
>
> Any chance you could provide a few snippets of code showing where you are
> locking and unlocking, and the work you are doing in between? Hard to tell
> what the problem is. If I had to guess, however, it sounds like you are dead
> locked. Perhaps you are locking, and throwing an un-caught error, and never
> unlocking? Or maybe you are just experiencing contention around your
> database which is causing other requests to back up waiting for that
> resource... If you can provide some more detailed information, including
> anythng odd you see in the server log that would be great! Also might want
> to check the SYSLOG for any database errors which could point to the
> problem.
>
> Also, have you considered upgrading to at least AOLserver 3.4.2 or even
> better 3.5.1? Would need more information to know exactly what you are
> trying to do, but you might be able to use the nsv_incr command for your
> counters.
>
> The nsv data structure is similiar to ns_share variables in that you can
> share variables between multiple threads/interps. The nsv implementation is
> a lot cleaner, and handles all the synchronization for you. Plus, as I
> mentioned before, there's a nifty nsv_incr command specifically for things
> like counters. ns_share is not recommended, especially when running Tcl 8.x.
>
> - Nathan
>
> ---------------------------------------------------
>
> Thanks Andrew for your input.
>
> We use Solaris as well and the AOLserver seems to work fine in any other
> situations except when ns_mutex comes to play. Here is more details how we
> are using it.
>
> We use ns_mutex inside a scheduled proc, which writes a cashed array of
> numbers (counters) to the database. This proc is scheduled for every 5
> minutes, to lock that array - so that no other process can manipulate that
> array at the moment it's being written to db - writes the numbers to db,
> resets the counters, and then unlock that array using ns_mutex unlock.
>
> Notice that this array is ns_share`ed. While everything seems to function
> and be happy, after the webserver gets more traffic, then we'll start seeing
> that all the process that have attempted to access that array, are waiting
> in the queue. At this stage the nsd process will take most of the CPU usage
> and the webserver almost doesn't respond the http requests. If we stop the
> traffic eventually (sometimes after a long time) the server will come back
> up to a normal operation and the queue will become empty.
>
> I modified that scheduled proc only to not lock that array (no ns_mutex
> use), and after making this change, webserver never got in to trouble.
> That's why I'm almost certain that ns_mutex is causing problems.
>
> I suspect maybe combination of ns_share and ns_mutex on that array might be
> the cause of this. I also noticed doing "upvar" on a ns_shared variable
> doesn't work !
>
> Any more inputs regarding this matter will greatly be appreciated.
>
> Thanks
> Seena
>
>
> -----Original Message-----
> From: Andrew Piskorski
> To: [EMAIL PROTECTED]
> Sent: 1/23/03 7:11 PM
> Subject: Re: [AOLSERVER] ns_mutex lock / unlock is likely causing our AOL
> webserver to hung
>
> On Thu, Jan 23, 2003 at 07:23:28PM -0500, Seena wrote:
>
> > After setting up a new server (AOLserver 3.3.1 w/ TCL 8), it seems
> using
> > the "ns_mutex" to luck array/list, while serve is running, bring our
> site
> > down. The same setup and code/application with AOLserver 2.3.3 w/ TCL
> 7,
> > works fine. Any comment why/how this is happening ?
> >
> > I've heard we can use ns_rwlock instead of ns_mutex, would anyone
> recommand
> > replacing ns_mutex with ns_rwlock ?
>
> I've used ns_mutex pretty heavily with AOLserver 3.3+ad13 and Tcl
> 8.3.2 on Solaris, and I've never had any problems. If your nsd
> process is dieing, you must have something broken in your AOLserver,
> although I've no idea what. Perhaps someone else here will, so you
> should probably post a lot more details: Where you got your AOLserver
> code, how you compiled it, what operating system, etc.
>
> I've never used ns_rwlock, so I don't know abou that. What exactly
> are you using ns_mutex for? Are you using ns_share? Perhaps you
> could avoid having to use ns_mutex at all by using nsv? Or are you
> doing something that you REALLY need to us ns_mutex for, like using
> ns_cond, or making several separate nsv operations atomic?
>
> Also, you said this problem "brings your site down", but in the
> subject you said AOLserver is "hung"? What exactly is the failure
> mode? Is your nsd process segfaulting? Or are you just deadlocking
> threads such that AOLserver hangs there doing nothing?
>
> --
> Andrew Piskorski <[EMAIL PROTECTED]>
> http://www.piskorski.com <http://www.piskorski.com/>
>
>
>
>
>
> ------_=_NextPart_001_01C2C61E.F6AB3890
> Content-Type: text/html
>
> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
> <HTML><HEAD>
> <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=US-ASCII">
>
>
> <META content="MSHTML 6.00.2800.1106" name=GENERATOR></HEAD>
> <BODY>
> <DIV><FONT face=Arial><FONT size=2><SPAN class=480185415-27012003>Hey
> Nathan!</SPAN></FONT></FONT></DIV>
> <DIV><FONT face=Arial><FONT size=2><SPAN
> class=480185415-27012003></SPAN></FONT></FONT> </DIV>
> <DIV><FONT face=Arial><FONT size=2><SPAN class=480185415-27012003>Here is
> the simplified version of the code which shows how we are using ns_mutex in our
> application. Basically the proc A, is being called a lot ( more than 100
> times in a minute) across the applications, and proc B is scheduled to run every
> ~5 minutes. Here the primary reason for using ns_mutex is to protect
> counters' values while it's being manipulated (
> incremented/written/cleared) from being accessed by other threads.
> </SPAN></FONT></FONT></DIV>
> <DIV><FONT face=Arial><FONT size=2><SPAN
> class=480185415-27012003></SPAN></FONT></FONT> </DIV>
> <DIV><FONT face=Arial><FONT size=2><SPAN class=480185415-27012003>Please feel
> free to criticize this code as much as you can! </SPAN></FONT></FONT></DIV>
> <DIV><FONT face=Arial><FONT size=2><SPAN
> class=480185415-27012003></SPAN></FONT></FONT> </DIV>
> <DIV><FONT face=Arial><FONT size=2><SPAN class=480185415-27012003>Again we are
> seeing that AOLserver 3.3.1 gets into trouble after calling this
> procs heavily (eventually the server goes down). By
> only taking out the ns_mutex lines, we'll have no problem!. Previously we
> never had any problem running these on Version
> 2.3.3.</SPAN></FONT></FONT></DIV>
> <DIV><FONT face=Arial><FONT size=2><SPAN
> class=480185415-27012003></SPAN></FONT></FONT> </DIV>
> <DIV><FONT face=Arial><FONT size=2><SPAN class=480185415-27012003>In the
> meanwhile regarding the ns_share, what is the major issue with it that
> people encourage not to use it ?</SPAN></FONT></FONT></DIV>
> <DIV><FONT face=Arial><FONT size=2><SPAN
> class=480185415-27012003></SPAN></FONT></FONT> </DIV>
> <DIV><FONT face=Arial><FONT size=2><SPAN
> class=480185415-27012003>Thanks!</SPAN></FONT></FONT></DIV>
> <DIV><FONT face=Arial><FONT size=2><SPAN
> class=480185415-27012003>--Seena</SPAN></FONT></FONT></DIV>
> <DIV><FONT face=Arial><FONT size=2><SPAN
> class=480185415-27012003></SPAN></FONT></FONT> </DIV>
> <DIV><FONT><FONT face="Courier New" color=#0000ff size=2><SPAN
> class=480185415-27012003>#####################################
> </SPAN></FONT></FONT></DIV>
> <DIV><FONT face="Courier New"><FONT color=#0000ff><FONT size=2>ns_share
> counter_A <BR>ns_share counter_B <BR>ns_share -init { set
> <STRONG>counter_mutex [ns_mutex create]</STRONG>
> } <STRONG>counter_mutex </STRONG><SPAN
> class=480185415-27012003> </SPAN></FONT></FONT></FONT></DIV>
> <DIV><FONT face="Courier New" color=#0000ff size=2></FONT> </DIV>
> <DIV><FONT face=Arial color=#0000ff size=2></FONT><FONT face=Arial color=#0000ff
> size=2></FONT><FONT face=Arial color=#0000ff size=2></FONT><FONT face=Arial
> color=#0000ff size=2></FONT><BR><FONT face="Courier New" color=#0000ff
> size=2>proc X {i} {</FONT></DIV>
> <DIV><FONT face="Courier New" color=#0000ff size=2></FONT> </DIV>
> <DIV><FONT face="Courier New" color=#0000ff size=2> ns_share
> counter_A<BR> ns_share counter_B<BR> ns_share counter_mutex
> </FONT></DIV>
> <DIV><FONT face="Courier New" color=#0000ff size=2></FONT> </DIV>
> <DIV><FONT face="Courier New" color=#0000ff size=2><STRONG> ns_mutex lock
> $counter_mutex</STRONG></FONT></DIV>
> <DIV><FONT face="Courier New" color=#0000ff size=2></FONT> </DIV>
> <DIV><FONT face="Courier New"><FONT color=#0000ff><FONT size=2> incr
> counter_A($i) 1<BR> incr counter_B($i)
> 1<BR> <BR> <STRONG>ns_mutex unlock
> $counter_mutex</STRONG></FONT></FONT></FONT></DIV>
> <DIV><FONT
> face="Courier New"><STRONG></STRONG><STRONG></STRONG><STRONG></STRONG><BR><FONT
> color=#0000ff size=2>}</FONT></FONT></DIV>
> <DIV><FONT face="Courier New" color=#0000ff size=2></FONT> </DIV><FONT
> face=Arial></FONT><FONT face=Arial></FONT><FONT face=Arial></FONT>
> <DIV><BR><FONT face="Courier New" color=#0000ff size=2>proc_doc Y {}
> {</FONT></DIV>
> <DIV><FONT face="Courier New" color=#0000ff size=2></FONT> </DIV>
> <DIV><FONT face="Courier New" color=#0000ff size=2> ns_share
> counter_A<BR> ns_share counter_B<BR> ns_share counter_mutex
> </FONT></DIV>
> <DIV><FONT face="Courier New"><FONT color=#0000ff><FONT
> size=2><STRONG></STRONG></FONT></FONT></FONT> </DIV>
> <DIV><FONT face="Courier New"><FONT color=#0000ff><FONT
> size=2><STRONG> ns_mutex lock
> $counter_mutex</STRONG><BR> <BR> foreach i_index [array names
> counter_A] {<BR> <BR> set temp_counter_A($i_index)
> $conter_A($i_index)<BR> set temp_counter_B($i_index)
> $conter_B($i_index)<BR> <BR> unset
> $conter_A($i_index)<BR> unset
> $conter_B($i_index)<BR> <BR> }</FONT></FONT></FONT></DIV>
> <DIV><FONT face="Courier New" color=#0000ff size=2></FONT> </DIV>
> <DIV><FONT face="Courier New"><FONT color=#0000ff><FONT
> size=2><STRONG> ns_mutex unlock
> $counter_mutex<BR></STRONG> <BR> ## writing $temp_counter_A <SPAN
> class=480185415-27012003>and</SPAN> $temp_counter_B arrays to
> database<BR> <BR>}<BR></FONT></FONT></FONT></DIV>
> <DIV><FONT face=Arial color=#0000ff size=2>
> <DIV><FONT size=+0><FONT face="Courier New" color=#0000ff size=2><SPAN
> class=480185415-27012003>#####################################
> </SPAN></FONT></FONT></DIV></FONT></DIV>
> <DIV><FONT face=Arial color=#0000ff size=2></FONT> </DIV>
> <BLOCKQUOTE>
> <DIV class=OutlookMessageHeader dir=ltr align=left><FONT face=Tahoma
> size=2>-----Original Message-----<BR><B>From:</B> Nathan Folkman
> [mailto:[EMAIL PROTECTED]]<BR><B>Sent:</B> Friday, January 24, 2003 7:08
> PM<BR><B>To:</B> [EMAIL PROTECTED]<BR><B>Subject:</B> Re: [AOLSERVER]
> ns_mutex lock / unlock is likely causing our AOL webserver
> to...<BR><BR></FONT></DIV><FONT lang=0 size=2 FAMILY="SANSSERIF"><FONT
> face=Arial>In a message dated 1/24/2003 4:47:20 PM Eastern Standard Time,
> [EMAIL PROTECTED] writes:<BR><BR></FONT>
> <BLOCKQUOTE
> style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #0000ff 2px solid;
>MARGIN-RIGHT: 0px"
> TYPE="CITE"><FONT face=Arial>Any more inputs regarding this matter will
> greatly be appreciated.</FONT></FONT><FONT lang=0
> style="BACKGROUND-COLOR: #ffffff" color=#000000 size=3
> FAMILY="SANSSERIF"><FONT face=Arial> </FONT></BLOCKQUOTE>
> <DIV><FONT face=Arial color=#0000ff size=2></FONT><FONT face=Arial
> color=#0000ff size=2></FONT><BR></FONT><FONT lang=0
> style="BACKGROUND-COLOR: #ffffff" color=#000000 FAMILY="SANSSERIF"><BR><FONT
> face=Arial size=2>Any chance you could provide a few snippets of code showing
> where you are locking and unlocking, and the work you are doing in between?
> Hard to tell what the problem is. If I had to guess, however, it sounds like
> you are dead locked. Perhaps you are locking, and throwing an un-caught error,
> and never unlocking? Or maybe you are just experiencing contention around your
> database which is causing other requests to back up waiting for that
> resource... If you can provide some more detailed information, including
> anythng odd you see in the server log that would be great! Also might want to
> check the SYSLOG for any database errors which could point to the
> problem.<BR><BR>Also, have you considered upgrading to at least AOLserver
> 3.4.2 or even better 3.5.1? Would need more information to know exactly what
> you are trying to do, but you might be able to use the nsv_incr command for
> your counters. <BR><BR>The nsv data structure is similiar to ns_share
> variables in that you can share variables between multiple threads/interps.
> The nsv implementation is a lot cleaner, and handles all the synchronization
> for you. Plus, as I mentioned before, there's a nifty nsv_incr command
> specifically for things like counters. ns_share is not recommended, especially
> when running Tcl 8.x.<BR><BR>- Nathan<BR></FONT></FONT></DIV>
> <DIV><FONT lang=0 style="BACKGROUND-COLOR: #ffffff" color=#000000
> FAMILY="SANSSERIF"><FONT face=Arial size=2><SPAN
> class=480185415-27012003><FONT
>
>color=#0000ff>---------------------------------------------------</FONT> </SPAN><BR></DIV></FONT></FONT>
> <DIV><FONT lang=0 style="BACKGROUND-COLOR: #ffffff" color=#000000
> FAMILY="SANSSERIF"><FONT face=Arial size=2><SPAN class=480185415-27012003>
> <P><FONT size=2>Thanks Andrew for your input.</FONT> </P>
> <P><FONT size=2>We use Solaris as well and the AOLserver seems to work fine in
> any other situations except when ns_mutex comes to play. Here is more details
> how we are using it.</FONT></P>
> <P><FONT size=2>We use ns_mutex inside a scheduled proc, which writes a cashed
> array of numbers (counters) to the database. This proc is scheduled for every
> 5 minutes, to lock that array - so that no other process can manipulate that
> array at the moment it's being written to db - writes the numbers to db,
> resets the counters, and then unlock that array using ns_mutex
> unlock.</FONT></P>
> <P><FONT size=2>Notice that this array is ns_share`ed. While everything seems
> to function and be happy, after the webserver gets more traffic, then we'll
> start seeing that all the process that have attempted to access that array,
> are waiting in the queue. At this stage the nsd process will take most of the
> CPU usage and the webserver almost doesn't respond the http requests. If we
> stop the traffic eventually (sometimes after a long time) the server will come
> back up to a normal operation and the queue will become empty. </FONT></P>
> <P><FONT size=2>I modified that scheduled proc only to not lock that array (no
> ns_mutex use), and after making this change, webserver never got in to
> trouble. That's why I'm almost certain that ns_mutex is causing
> problems.</FONT></P>
> <P><FONT size=2>I suspect maybe combination of ns_share and ns_mutex on that
> array might be the cause of this. I also noticed doing "upvar" on a ns_shared
> variable doesn't work !</FONT></P>
> <P><FONT size=2>Any more inputs regarding this matter will greatly be
> appreciated.</FONT> </P>
> <P><FONT size=2>Thanks</FONT> <BR><FONT size=2>Seena</FONT> </P><BR>
> <P><FONT size=2>-----Original Message-----</FONT> <BR><FONT size=2>From:
> Andrew Piskorski</FONT> <BR><FONT size=2>To: [EMAIL PROTECTED]</FONT>
> <BR><FONT size=2>Sent: 1/23/03 7:11 PM</FONT> <BR><FONT size=2>Subject: Re:
> [AOLSERVER] ns_mutex lock / unlock is likely causing our AOL webserver to
> hung</FONT> </P>
> <P><FONT size=2>On Thu, Jan 23, 2003 at 07:23:28PM -0500, Seena wrote:</FONT>
> </P>
> <P><FONT size=2>> After setting up a new server (AOLserver 3.3.1 w/ TCL 8),
> it seems</FONT> <BR><FONT size=2>using</FONT> <BR><FONT size=2>> the
> "ns_mutex" to luck array/list, while serve is running, bring our</FONT>
> <BR><FONT size=2>site</FONT> <BR><FONT size=2>> down. The same setup and
> code/application with AOLserver 2.3.3 w/ TCL</FONT> <BR><FONT size=2>7,</FONT>
> <BR><FONT size=2>> works fine. Any comment why/how this is happening
> ?</FONT> <BR><FONT size=2>></FONT> <BR><FONT size=2>> I've heard we can
> use ns_rwlock instead of ns_mutex, would anyone</FONT> <BR><FONT
> size=2>recommand</FONT> <BR><FONT size=2>> replacing ns_mutex with
> ns_rwlock ?</FONT> </P>
> <P><FONT size=2>I've used ns_mutex pretty heavily with AOLserver 3.3+ad13 and
> Tcl</FONT> <BR><FONT size=2>8.3.2 on Solaris, and I've never had any
> problems. If your nsd</FONT> <BR><FONT size=2>process is dieing, you
> must have something broken in your AOLserver,</FONT> <BR><FONT size=2>although
> I've no idea what. Perhaps someone else here will, so you</FONT>
> <BR><FONT size=2>should probably post a lot more details: Where you got your
> AOLserver</FONT> <BR><FONT size=2>code, how you compiled it, what operating
> system, etc.</FONT> </P>
> <P><FONT size=2>I've never used ns_rwlock, so I don't know abou that.
> What exactly</FONT> <BR><FONT size=2>are you using ns_mutex for? Are you
> using ns_share? Perhaps you</FONT> <BR><FONT size=2>could avoid having
> to use ns_mutex at all by using nsv? Or are you</FONT> <BR><FONT
> size=2>doing something that you REALLY need to us ns_mutex for, like
> using</FONT> <BR><FONT size=2>ns_cond, or making several separate nsv
> operations atomic?</FONT> </P>
> <P><FONT size=2>Also, you said this problem "brings your site down", but in
> the</FONT> <BR><FONT size=2>subject you said AOLserver is "hung"? What
> exactly is the failure</FONT> <BR><FONT size=2>mode? Is your nsd process
> segfaulting? Or are you just deadlocking</FONT> <BR><FONT size=2>threads
> such that AOLserver hangs there doing nothing?</FONT> </P>
> <P><FONT size=2>--</FONT> <BR><FONT size=2>Andrew Piskorski
> <[EMAIL PROTECTED]></FONT> <BR><FONT size=2><A
> href="http://www.piskorski.com/"
> target=_blank>http://www.piskorski.com</A></FONT>
> </P> </SPAN><BR></DIV></FONT></BLOCKQUOTE></FONT></BODY></HTML>
>
> ------_=_NextPart_001_01C2C61E.F6AB3890--
>