On Mon, Oct 28, 2002 at 05:58:44PM +0200, Zoran Vasiljevic wrote:
> Again, watch for the hidden usage of functions doing something
> with the process environment. Tcl and AOLserver are
> taking care of that (Tcl has a bug, as I mentioned, though)
> but your other code may not.
I have a feeling that the answer to my segfault problems has been
sitting in front of my nose all along. To recap, I have a vendor
library, of unknown thread-safe-ness, which I'm using in an AOLserver
loadable module. Despite serializing all access to the vendor library
with a mutex, I still occasionaly get weird segfaults in _smalloc,
nasty Purify errors, and occasional other weird behavior. All
symptoms point to heap corruption.
Turns out, *ALL* the Array Bounds Write, Free Memory Write and Read,
and other nasty Purify errors (aka, not just UMR) are from the "tzcpy"
function, which is called ultimately from localtime_u, localtime_r,
mktime, or strftime.
Zoran explained how the "clock format -gmt 1" thread-safety problem he
patched in Tcl was because of an un-serialized tzset() writing over
the TZ environment variable messing up the process env array.
Which sounds pretty similar to my problem... I applied Zoran's patch
to my AOLserver Tcl, but of course my vendor library is not using ANY
of Tcl's mutex locking.
Clearly, anytime thread-safety is achieved by mutex locking a
non-thread-safe library (rather than using a thread-safe library to
begin with) that mutex must be used by ALL code using it. I don't
have the source to my vendor library, but I DO have the source to all
of AOLserver and Tcl, so it should be able to achieve this.
I tried it. In aolserver/tcl8.3.2/generic/tclClock.c I changed
removed the "static" from the declaration of clockMutex, like this:
/* TCL_DECLARE_MUTEX(clockMutex) */
Tcl_Mutex clockMutex;
That way, without the static the linker exports it and I can access
clockMutex from my own loadable module. In my mymodule.h I added:
typedef struct Tcl_Mutex_ *Tcl_Mutex;
extern Tcl_Mutex clockMutex;
And then every single time I locked the mutex around my vendor
library, I also locked clockMutex.
Unfortunately, it didn't seem to help much. After I run for a while I
still start getting plenty of FUM, FMR, FMW, ABR errors in Purify -
see examples below.
Thoughts? Is my strategy flawed in general, or are there just other
mutexes I need to lock?
Does the TZ env array issue occur anywhere else in AOLserver or Tcl,
or are there any other non-thread-safe library calls I should bee
looking out for?
(If I had a good list of all non-thread safe library calls, maybe I
could run a truss on everything to everywhere they're being used...)
Examples of Purify errors:
--------------------------
Here I am locking clockMutex around all access to the closed-source
vendor library:
AOLserver code:
FUM: Freeing unallocated memory (3 times):
* This is occurring while in thread 14:
free [rtlib.o]
tzcpy [time_comm.c]
getzname [time_comm.c]
_ltzset_u [time_comm.c]
localtime_u [time_comm.c]
localtime_r [libc.so.1]
ns_localtime [reentrant.c:140]
Ns_LogTime2 [log.c:212]
Log [log.c:405]
ns_serverLog [log.c:144]
Ns_Log [log.c:153]
TclLog [log.c:352]
AOLserver code:
FMR: Free memory read:
* This is occurring while in thread 10:
strlen [rtlib.o]
tzcpy [time_comm.c]
getzname [time_comm.c]
_ltzset_u [time_comm.c]
mktime [libc.so.1]
strftime [libc.so.1]
Ns_LogTime2 [log.c:213]
Vendor code:
FMW: Free memory write (280 times):
* This is occurring while in thread 9:
strncpy [rtlib.o]
tzcpy [time_comm.c]
_ltzset_u [time_comm.c]
localtime_u [time_comm.c]
Vendor code:
FMW: Free memory write (280 times):
* This is occurring while in thread 9:
tzcpy [time_comm.c]
_ltzset_u [time_comm.c]
localtime_u [time_comm.c]
AOLserver code:
ABR: Array bounds read:
* This is occurring while in thread 9:
strlen [rtlib.o]
tzcpy [time_comm.c]
getzname [time_comm.c]
_ltzset_u [time_comm.c]
localtime_u [time_comm.c]
localtime_r [libc.so.1]
ns_localtime [reentrant.c:140]
Ns_LogTime2 [log.c:212]
--
Andrew Piskorski <[EMAIL PROTECTED]>
http://www.piskorski.com