Hi,
It appears this crash is Tcl trying to free some per-thread context in a thread
that's exiting after Aolserver is done with it's cleanup of Tcl. Checking the
latest Aolserver source on GitHub shows a final call to Tcl_Finalize in
nsd/nsmain.c just before the return. If you're running this version, try
commenting it out to see if the crash goes away.
More detail on what MAY be happening....
What ns_shutdown does is send a signal for the main thread to initiate
shutdown. The main thread then sends signals to all the subsystems (conn
threads, scheduler, etc.) to shutdown, waits for those shutdowns to complete
(i.e., threads to exit), and then does some final cleanup. As threads exit,
they call various per-thread cleanup handlers which rely on per-process state
including the Tcl core.
From your stack trace it looks like some thread possibly created outside of one
of these subsystems is exiting after Aolserver thinks shutdown is complete and
after Aolserver has called Tcl_Finalize to tear down the Tcl core. Such a
thread could be created by 3rd party code that calls pthread_create directly
(or even Ns_ThreadCreate directly), and then later calls into Aolserver. While
Aolserver attempts to carefully manage all the threads it knows about, and
there's considerable code to gracefully signal and wait for these threads, it
can't really control when, if, and how, these other threads exit.
It turns out the Aoslerver API is designed to attempt to handle this situation
a bit but Tcl generally is not. This is a symptom of the different approaches
to thread cleanup in Tcl and Aoslerver. Aolserver follows the pthread model
which calls registered cleanup routines in order and then tries again a few
times if necessary in the case some cleanup accidentally re-initializes some
resource (see the comment in NsCleanupTls in nsthread/tls.c).
Tcl instead provides various callback mechanisms for cleanup and there's much
care and coordination in the Tcl core to ensure things are cleaned up in the
right order. However, as Tcl is designed to be embedded in other code, this
level of care cannot be guaranteed outside the core. My opinion is that it was
always unfortunate Tcl chose this model given it's goals and constraints.
Another way to look at it is that in Aolserver, the correct order is a matter
of optimization whereas in Tcl it's a necessity.
The pthread model, while not perfect, in practice always seemed more robust for
far fewer lines of code. Admit-ably, Aolserver doesn't care so much as exit
really is about graceful shutdown of transaction processing threads -- the rest
is just aesthetics as the _exit() will evaporate memory, open files, etc.,
efficiently and accurately. Evidently Tcl has long operated in some embedded
systems where cleanup needed to be an actual cleanup. These use cases
pre-dated threaded Tcl and the old cleanup interfaces where extended for
threaded code instead of introducing a new model.
Anyway, as I could never really get Tcl cleanup to operate in a reliable way
and because it didn't really matter to Aolserver, the call to Tcl_Finalize had
been commented out for years. As this has become a recurring problem, I'd
suggest now it should be a config option, default off. In the off-chance
someone really needs Tcl_Finalize, they could set the option on.
Of course you could have some other problem. If this doesn't help, you could
try compiling with symbols and poking around in the core dump for some more
clues.
Cheers,
-Jim
On Mar 1, 2012, at 11:43 AM, Porter, Caroline wrote:
> tcl 8.5.9
>
> Caroline
>
> From: vgue...@gmail.com [mailto:vgue...@gmail.com] On Behalf Of Victor Guerra
> Sent: Thursday, March 01, 2012 10:36 AM
> To: Porter, Caroline
> Cc: aolserver-talk@lists.sourceforge.net
> Subject: Re: [AOLSERVER] Problem with ns_shutdown
>
> Which version of tcl are you running?
>
> On Thu, Mar 1, 2012 at 3:08 PM, Porter, Caroline <cpor...@bna.com> wrote:
> We are shutting down aolserver via the control port using the ns_shutdown
> command. We are getting intermittent coredumps during the shutdown process.
> Does anyone have any ideas as to how to resolve this?
>
> Here’s some more info…
>
> webserver log:
>
> [29/Feb/2012:08:20:02][30350.82082672][-nscp:1-] Notice: nscp: 127.0.0.1
> connected
> [29/Feb/2012:08:20:03][30350.82082672][-nscp:1-] Notice: nscp: nsadmin logged
> in
> [29/Feb/2012:08:20:04][30350.4151592640][-main-] Notice: nsmain:
> AOLserver/4.5.1 stopping
> [29/Feb/2012:08:20:04][30350.4151592640][-main-] Notice: driver: stopping:
> nssock
> [29/Feb/2012:08:20:04][30350.4151592640][-main-] Notice: sched: shutdown
> pending
> [29/Feb/2012:08:20:04][30350.131660656][-socks-] Notice: socks: shutdown
> pending
> [29/Feb/2012:08:20:04][30350.4141099888][-sched-] Notice: sched: shutdown
> started
> [29/Feb/2012:08:20:04][30350.4141099888][-sched-] Notice: sched: waiting for
> event threads...
> [29/Feb/2012:08:20:04][30350.131660656][-socks-] Notice: nscp: shutdown
> [29/Feb/2012:08:20:04][30350.66386800][-sched:idle1-] Notice: exiting
> [29/Feb/2012:08:20:04][30350.148007792][-sched:idle0-] Notice: exiting
> [29/Feb/2012:08:20:04][30350.131660656][-socks-] Notice: socks: shutdown
> complete
> [29/Feb/2012:08:20:04][30350.56376176][-nssock:driver-] Notice: exiting
> [29/Feb/2012:08:20:04][30350.4141099888][-sched-] Notice: sched: shutdown
> complete
> [29/Feb/2012:08:20:04][30350.4151592640][-main-] Notice: driver: stopped:
> nssock
> [29/Feb/2012:08:20:05][30350.82082672][-nscp:1-] Notice: nscp: 127.0.0.1
> disconnected
> [29/Feb/2012:08:20:05][30350.56376176][-shutdown-] Notice: Shutdown called
> for server bwd
> [29/Feb/2012:08:20:05][30350.56376176][-shutdown-] Notice: nslog: closing
> '/data/bwd/logs/httpd_access_stg_delray.bna.com_5000.log'
> [29/Feb/2012:08:20:05][30350.4151592640][-main-] Notice: nsmain:
> AOLserver/4.5.1 exiting
> called Tcl_FindHashEntry on deleted table
>
> Here’s what is in the coredump…
>
> Program terminated with signal 6, Aborted.
> #0 0x0071d430 in __kernel_vsyscall ()
> #0 0x0071d430 in __kernel_vsyscall ()
> #1 0x0036ab71 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
> #2 0x0036c44a in abort () at abort.c:92
> #3 0x002e8ddf in Tcl_PanicVA () from /apps/bos-dev/bwd/lib/libtcl8.5.so
> #4 0x002e8e04 in Tcl_Panic () from /apps/bos-dev/bwd/lib/libtcl8.5.so
> #5 0x002bccea in BogusFind () from /apps/bos-dev/bwd/lib/libtcl8.5.so
> #6 0x00304de1 in ThreadStorageGetHashTable () from
> /apps/bos-dev/bwd/lib/libtcl8.5.so
> #7 0x00304f0c in TclpThreadDataKeyGet () from
> /apps/bos-dev/bwd/lib/libtcl8.5.so
> #8 0x00303d28 in Tcl_GetThreadData () from /apps/bos-dev/bwd/lib/libtcl8.5.so
> #9 0x002e8545 in TclFreeObj () from /apps/bos-dev/bwd/lib/libtcl8.5.so
> #10 0x0030f8b0 in FreeVarEntry () from /apps/bos-dev/bwd/lib/libtcl8.5.so
> #11 0x002bc845 in Tcl_DeleteHashTable () from
> /apps/bos-dev/bwd/lib/libtcl8.5.so
> #12 0x0031052e in UnsetVarStruct () from /apps/bos-dev/bwd/lib/libtcl8.5.so
> #13 0x0031080f in TclDeleteNamespaceVars () from
> /apps/bos-dev/bwd/lib/libtcl8.5.so
> #14 0x002dfda8 in TclTeardownNamespace () from
> /apps/bos-dev/bwd/lib/libtcl8.5.so
> #15 0x002e0045 in Tcl_DeleteNamespace () from
> /apps/bos-dev/bwd/lib/libtcl8.5.so
> #16 0x002dfeab in TclTeardownNamespace () from
> /apps/bos-dev/bwd/lib/libtcl8.5.so
> #17 0x002e0045 in Tcl_DeleteNamespace () from
> /apps/bos-dev/bwd/lib/libtcl8.5.so
> #18 0x002dfeab in TclTeardownNamespace () from
> /apps/bos-dev/bwd/lib/libtcl8.5.so
> #19 0x002647a7 in DeleteInterpProc () from /apps/bos-dev/bwd/lib/libtcl8.5.so
> #20 0x002f47a4 in Tcl_EventuallyFree () from
> /apps/bos-dev/bwd/lib/libtcl8.5.so
> #21 0x00264702 in Tcl_DeleteInterp () from /apps/bos-dev/bwd/lib/libtcl8.5.so
> #22 0x0014dd2f in Ns_TclDestroyInterp () from /apps/bos-dev/bwd/lib/libnsd.so
> #23 0x0014e508 in DeleteData () from /apps/bos-dev/bwd/lib/libnsd.so
> #24 0x00ca6479 in NsCleanupTls () from /apps/bos-dev/bwd/lib/libnsthread.so
> #25 0x00ca81e2 in FreeThread () from /apps/bos-dev/bwd/lib/libnsthread.so
> #26 0x00174a8a in __nptl_deallocate_tsd (arg=0x4e47b70) at
> pthread_create.c:154
> #27 start_thread (arg=0x4e47b70) at pthread_create.c:308
> #28 0x0041cc2e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:133
>
>
>
>
> ------------------------------------------------------------------------------
> Virtualization & Cloud Management Using Capacity Planning
> Cloud computing makes use of virtualization - but cloud computing
> also focuses on allowing computing to be delivered as a service.
> http://www.accelacomm.com/jaw/sfnl/114/51521223/
> _______________________________________________
> aolserver-talk mailing list
> aolserver-talk@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/aolserver-talk
>
>
>
>
> --
> -vg
> ------------------------------------------------------------------------------
> Virtualization & Cloud Management Using Capacity Planning
> Cloud computing makes use of virtualization - but cloud computing
> also focuses on allowing computing to be delivered as a service.
> http://www.accelacomm.com/jaw/sfnl/114/51521223/_______________________________________________
> aolserver-talk mailing list
> aolserver-talk@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/aolserver-talk
------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
aolserver-talk mailing list
aolserver-talk@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/aolserver-talk