It seems the problem causing our server crashes is that I recompiled
AOLServer with -O2 on Linux. When compiled with -g, the server
doesn't crash, and the test case seems to be running reliably at the
same speed - actually, a little faster.
I'm guessing that gcc doesn't realize the static vars in AS should all
be considered volatile and an optimization is causing unintended
behavior. Until vars are declared volatile, maybe there should be a
note in the makefile to warn against using optimization. Since -O2
was sitting in Makefile.global and commented out without any reason
why, I figured it was okay to use it. Wrong!
Jim
>
> I narrowed this nagging server crash bug down today to a test case:
>
> ns_share sharevar
> set sharevar(1) 1
> if {[info exists sharevar(2)]} {
> }
> ns_return 200 text/plain "hi"
>
> Putting this TCL script in pageroot with no other modules loaded and
> hitting it at a rate of about 800 times per second from another server
> using ns_geturl in 25 threads, both machines dual processors, will
> trigger the crash in a few seconds. It may be repeatable using just
> one machine; I didn't try that.
>
> Removing the subscript from the "info exists" variable prevents the
> crash.
>
> I'm no TCL (7.6) internals expert... Tips on where to look to fix this
> are most welcome.
>
> TIA,
> Jim
>