[AOLSERVER] async cancel

Stuart Children Fri, 04 Aug 2006 02:55:41 -0700

Hiya

I've been rebuilding our test systems with the new 4.5.0 release, andhappily have had no issues with the upgrade (other than a couple ofsmall build issues). One problem has come up, but it's taking advantageof a new feature. Some background first...

We have a "timeout" module, that spawns a monitor thread and watchesconnections to ensure they don't exceed either a default timeout, or onethey set themselves. At the moment all we do is fire a signal handlerwhich sets a global variable. We then monitor this in certain bits ofcode (primarily inside a customised nsoracle module which is usingnon-blockingmode - as this is typically where our scripts get held up),and raise errors when and where we can. However, using the new [ns_ictlcancel] command we can actually cover a lot more cases and for examplebreak out of looping TCL code (caused by programmer error, or acondition that's unexpected failing to be met).

To implement this I've just extracted the lines called by that subcommand from NsTclICtlObjCmd() in nsd/tclinit.c into a standalone function:


/* Implements ns_ictl cancel  */
int
Ns_ICtlCancel(int threadid)
{
        Tcl_HashEntry *hPtr;
        TclData *dataPtr;

        Ns_MutexLock(&tlock);
        hPtr = Tcl_FindHashEntry(&threads, (char *) threadid);
        if (hPtr != NULL) {
            dataPtr = Tcl_GetHashValue(hPtr);
            Tcl_AsyncMark(dataPtr->cancel);
        }
        Ns_MutexUnlock(&tlock);
        if (hPtr == NULL) {
                return NS_ERROR;
        }
        return NS_OK;
}

I then simply call the above from inside our timeout module (we alreadyhave the thread id available, so it's a one line addition).

When I was testing this yesterday it was all working precisely asexpected. Fantastic! However, overnight both of the servers I installedthis on fell over - seemingly the first time the code was invoked aftermidnight. I'll try to replicate this later. However, looking at the coredumps:


#0  0x00a897a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x00ac97d5 in raise () from /lib/tls/libc.so.6
#2  0x00acb149 in abort () from /lib/tls/libc.so.6
#3  0x001cbe3a in Abort (signal=11) at unix.c:365
#4  <signal handler called>

#5 0x00766db0 in ResetObjResult (iPtr=0x0) at/tmp/stuartc_builddir/aolserver/tcl8.4.13/unix/../generic/tclResult.c:824#6 0x00766d30 in Tcl_ResetResult (interp=0x0) at/tmp/stuartc_builddir/aolserver/tcl8.4.13/unix/../generic/tclResult.c:787#7 0x001b8ac1 in AsyncCancel (ignored=0x0, interp=0x0, code=0) attclinit.c:2086#8 0x006f9cc2 in Tcl_AsyncInvoke (interp=0x0, code=0) at/tmp/stuartc_builddir/aolserver/tcl8.4.13/unix/../generic/tclAsync.c:256#9 0x00757e53 in Tcl_ServiceEvent (flags=-3) at/tmp/stuartc_builddir/aolserver/tcl8.4.13/unix/../generic/tclNotify.c:590#10 0x00758305 in Tcl_DoOneEvent (flags=-3) at/tmp/stuartc_builddir/aolserver/tcl8.4.13/unix/../generic/tclNotify.c:945#11 0x0072992f in Tcl_VwaitObjCmd (clientData=0x0, interp=0x9aabc20,objc=2, objv=0x970a390)at/tmp/stuartc_builddir/aolserver/tcl8.4.13/unix/../generic/tclEvent.c:1101#12 0x006fc93a in TclEvalObjvInternal (interp=0x9aabc20, objc=2,objv=0x970a390, command=0x0, length=0, flags=0)at/tmp/stuartc_builddir/aolserver/tcl8.4.13/unix/../generic/tclBasic.c:3087

It's clear where the problem is - AsyncCancel has been passed NULL whereit's expecting a valid interpreter pointer. So I've added a quick checkto that for the time being, to simply log such events rather than crashthe server. However, it's got a NULL because one was passed toTcl_AsyncInvoke - in this case that looks pretty intentional,tclNotify.c:590 is:


        (void) Tcl_AsyncInvoke((Tcl_Interp *) NULL, 0);

and the comments in Tcl_AsyncInvoke() make it clear it expects tosometimes be called with a NULL.

Now I don't know all the details of why Tcl_AsyncInvoke is invoked witha NULL, and what we want to do in that situation... so I thought I'd askhere before trying to follow large amounts of the TCL and AOLserversource through by hand. :)

Clearly if that is valid behaviour by TCL, then AsyncCancel must bemodified. Is it OK for us to look up our interpreter (Ns_TclGetConn +Ns_GetConnInterp)? If not... The return code is being ignored, so howcan we cause an error - or will we be invoked again (this time with aninterp), at which point we can do our job of provoking an error and soscript cancellation?

TIA

PS: Ff other people are potentially interested in this module, thenplease let me know. It would also be good to know whether there would beany objections to modifying the core slightly (extending the conn structand adding some API functions), so that the module can be betterintegrated. At present it requires that you (as an interpreter) add andremove yourself at the start and end of a connection. This is fine forus as we have a ns_register_proc'd TCL wrapper on all our requests anyway.


--
Stuart Children
http://terminus.co.uk/


--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> 
with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
field of your email blank.

[AOLSERVER] async cancel

Reply via email to