Hi,

It appears this crash is Tcl trying to free some per-thread context in a thread 
that's exiting after Aolserver is done with it's cleanup of Tcl.  Checking the 
latest Aolserver source on GitHub shows a final call to Tcl_Finalize in 
nsd/nsmain.c just before the return.  If you're running this version, try 
commenting it out to see if the crash goes away.



More detail on what MAY be happening....


What ns_shutdown does is send a signal for the main thread to initiate 
shutdown.  The main thread then sends signals to all the subsystems (conn 
threads, scheduler, etc.) to shutdown, waits for those shutdowns to complete 
(i.e., threads to exit), and then does some final cleanup. As threads exit, 
they call various per-thread cleanup handlers which rely on per-process state 
including the Tcl core.

From your stack trace it looks like some thread possibly created outside of one 
of these subsystems is exiting after Aolserver thinks shutdown is complete and 
after Aolserver has called Tcl_Finalize to tear down the Tcl core.  Such a 
thread could be created by 3rd party code that calls pthread_create directly 
(or even Ns_ThreadCreate directly), and then later calls into Aolserver.  While 
Aolserver attempts to carefully manage all the threads it knows about, and 
there's considerable code to gracefully signal and wait for these threads, it 
can't really control when, if, and how, these other threads exit.

It turns out the Aoslerver API is designed to attempt to handle this situation 
a bit but Tcl generally is not.  This is a symptom of the different approaches 
to thread cleanup in Tcl and Aoslerver.  Aolserver follows the pthread model 
which calls registered cleanup routines in order and then tries again a few 
times if necessary in the case some cleanup accidentally re-initializes some 
resource (see the comment in NsCleanupTls in nsthread/tls.c).

Tcl instead provides various callback mechanisms for cleanup and there's much 
care and coordination in the Tcl core to ensure things are cleaned up in the 
right order.  However, as Tcl is designed to be embedded in other code, this 
level of care cannot be guaranteed outside the core.  My opinion is that it was 
always unfortunate Tcl chose this model given it's goals and constraints.  
Another way to look at it is that in Aolserver, the correct order is a matter 
of optimization whereas in Tcl it's a necessity. 

The pthread model, while not perfect, in practice always seemed more robust for 
far fewer lines of code.   Admit-ably, Aolserver doesn't care so much as exit 
really is about graceful shutdown of transaction processing threads -- the rest 
is just aesthetics as the _exit() will evaporate memory, open files, etc., 
efficiently and accurately.  Evidently Tcl has long operated in some embedded 
systems where cleanup needed to be an actual cleanup.  These use cases 
pre-dated threaded Tcl and the old cleanup interfaces where extended for 
threaded code instead of introducing a new model.

Anyway, as I could never really get Tcl cleanup to operate in a reliable way 
and because it didn't really matter to Aolserver, the call to Tcl_Finalize had 
been commented out for years. As this has become a recurring problem, I'd 
suggest now it should be a config option, default off.  In the off-chance 
someone really needs Tcl_Finalize, they could set the option on.

Of course you could have some other problem.  If this doesn't help, you could 
try compiling with symbols and poking around in the core dump for some more 
clues.


Cheers,
-Jim





On Mar 1, 2012, at 11:43 AM, Porter, Caroline wrote:

> tcl 8.5.9
>  
> Caroline
>  
> From: vgue...@gmail.com [mailto:vgue...@gmail.com] On Behalf Of Victor Guerra
> Sent: Thursday, March 01, 2012 10:36 AM
> To: Porter, Caroline
> Cc: aolserver-talk@lists.sourceforge.net
> Subject: Re: [AOLSERVER] Problem with ns_shutdown
>  
> Which version of tcl are you running? 
> 
> On Thu, Mar 1, 2012 at 3:08 PM, Porter, Caroline <cpor...@bna.com> wrote:
> We are shutting down aolserver via the control port using the ns_shutdown 
> command.  We are getting intermittent coredumps during the shutdown process.  
> Does anyone have any ideas as to how to resolve this?
>  
> Here’s some more info…
>  
> webserver log:
>  
> [29/Feb/2012:08:20:02][30350.82082672][-nscp:1-] Notice: nscp: 127.0.0.1 
> connected
> [29/Feb/2012:08:20:03][30350.82082672][-nscp:1-] Notice: nscp: nsadmin logged 
> in
> [29/Feb/2012:08:20:04][30350.4151592640][-main-] Notice: nsmain: 
> AOLserver/4.5.1 stopping
> [29/Feb/2012:08:20:04][30350.4151592640][-main-] Notice: driver: stopping: 
> nssock
> [29/Feb/2012:08:20:04][30350.4151592640][-main-] Notice: sched: shutdown 
> pending
> [29/Feb/2012:08:20:04][30350.131660656][-socks-] Notice: socks: shutdown 
> pending
> [29/Feb/2012:08:20:04][30350.4141099888][-sched-] Notice: sched: shutdown 
> started
> [29/Feb/2012:08:20:04][30350.4141099888][-sched-] Notice: sched: waiting for 
> event threads...
> [29/Feb/2012:08:20:04][30350.131660656][-socks-] Notice: nscp: shutdown
> [29/Feb/2012:08:20:04][30350.66386800][-sched:idle1-] Notice: exiting
> [29/Feb/2012:08:20:04][30350.148007792][-sched:idle0-] Notice: exiting
> [29/Feb/2012:08:20:04][30350.131660656][-socks-] Notice: socks: shutdown 
> complete
> [29/Feb/2012:08:20:04][30350.56376176][-nssock:driver-] Notice: exiting
> [29/Feb/2012:08:20:04][30350.4141099888][-sched-] Notice: sched: shutdown 
> complete
> [29/Feb/2012:08:20:04][30350.4151592640][-main-] Notice: driver: stopped: 
> nssock
> [29/Feb/2012:08:20:05][30350.82082672][-nscp:1-] Notice: nscp: 127.0.0.1 
> disconnected
> [29/Feb/2012:08:20:05][30350.56376176][-shutdown-] Notice: Shutdown called 
> for server bwd
> [29/Feb/2012:08:20:05][30350.56376176][-shutdown-] Notice: nslog: closing 
> '/data/bwd/logs/httpd_access_stg_delray.bna.com_5000.log'
> [29/Feb/2012:08:20:05][30350.4151592640][-main-] Notice: nsmain: 
> AOLserver/4.5.1 exiting
> called Tcl_FindHashEntry on deleted table
>  
> Here’s what is in the coredump…
>  
> Program terminated with signal 6, Aborted.
> #0  0x0071d430 in __kernel_vsyscall ()
> #0  0x0071d430 in __kernel_vsyscall ()
> #1  0x0036ab71 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
> #2  0x0036c44a in abort () at abort.c:92
> #3  0x002e8ddf in Tcl_PanicVA () from /apps/bos-dev/bwd/lib/libtcl8.5.so
> #4  0x002e8e04 in Tcl_Panic () from /apps/bos-dev/bwd/lib/libtcl8.5.so
> #5  0x002bccea in BogusFind () from /apps/bos-dev/bwd/lib/libtcl8.5.so
> #6  0x00304de1 in ThreadStorageGetHashTable () from 
> /apps/bos-dev/bwd/lib/libtcl8.5.so
> #7  0x00304f0c in TclpThreadDataKeyGet () from 
> /apps/bos-dev/bwd/lib/libtcl8.5.so
> #8  0x00303d28 in Tcl_GetThreadData () from /apps/bos-dev/bwd/lib/libtcl8.5.so
> #9  0x002e8545 in TclFreeObj () from /apps/bos-dev/bwd/lib/libtcl8.5.so
> #10 0x0030f8b0 in FreeVarEntry () from /apps/bos-dev/bwd/lib/libtcl8.5.so
> #11 0x002bc845 in Tcl_DeleteHashTable () from 
> /apps/bos-dev/bwd/lib/libtcl8.5.so
> #12 0x0031052e in UnsetVarStruct () from /apps/bos-dev/bwd/lib/libtcl8.5.so
> #13 0x0031080f in TclDeleteNamespaceVars () from 
> /apps/bos-dev/bwd/lib/libtcl8.5.so
> #14 0x002dfda8 in TclTeardownNamespace () from 
> /apps/bos-dev/bwd/lib/libtcl8.5.so
> #15 0x002e0045 in Tcl_DeleteNamespace () from 
> /apps/bos-dev/bwd/lib/libtcl8.5.so
> #16 0x002dfeab in TclTeardownNamespace () from 
> /apps/bos-dev/bwd/lib/libtcl8.5.so
> #17 0x002e0045 in Tcl_DeleteNamespace () from 
> /apps/bos-dev/bwd/lib/libtcl8.5.so
> #18 0x002dfeab in TclTeardownNamespace () from 
> /apps/bos-dev/bwd/lib/libtcl8.5.so
> #19 0x002647a7 in DeleteInterpProc () from /apps/bos-dev/bwd/lib/libtcl8.5.so
> #20 0x002f47a4 in Tcl_EventuallyFree () from 
> /apps/bos-dev/bwd/lib/libtcl8.5.so
> #21 0x00264702 in Tcl_DeleteInterp () from /apps/bos-dev/bwd/lib/libtcl8.5.so
> #22 0x0014dd2f in Ns_TclDestroyInterp () from /apps/bos-dev/bwd/lib/libnsd.so
> #23 0x0014e508 in DeleteData () from /apps/bos-dev/bwd/lib/libnsd.so
> #24 0x00ca6479 in NsCleanupTls () from /apps/bos-dev/bwd/lib/libnsthread.so
> #25 0x00ca81e2 in FreeThread () from /apps/bos-dev/bwd/lib/libnsthread.so
> #26 0x00174a8a in __nptl_deallocate_tsd (arg=0x4e47b70) at 
> pthread_create.c:154
> #27 start_thread (arg=0x4e47b70) at pthread_create.c:308
> #28 0x0041cc2e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:133
>  
>  
>  
> 
> ------------------------------------------------------------------------------
> Virtualization & Cloud Management Using Capacity Planning
> Cloud computing makes use of virtualization - but cloud computing
> also focuses on allowing computing to be delivered as a service.
> http://www.accelacomm.com/jaw/sfnl/114/51521223/
> _______________________________________________
> aolserver-talk mailing list
> aolserver-talk@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/aolserver-talk
> 
> 
> 
>  
> -- 
> -vg
> ------------------------------------------------------------------------------
> Virtualization & Cloud Management Using Capacity Planning
> Cloud computing makes use of virtualization - but cloud computing 
> also focuses on allowing computing to be delivered as a service.
> http://www.accelacomm.com/jaw/sfnl/114/51521223/_______________________________________________
> aolserver-talk mailing list
> aolserver-talk@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/aolserver-talk

------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
aolserver-talk mailing list
aolserver-talk@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/aolserver-talk

Reply via email to