On Mon, Oct 28, 2002 at 05:58:44PM +0200, Zoran Vasiljevic wrote:

> Again, watch for the hidden usage of functions doing something
> with the process environment. Tcl and AOLserver are
> taking care of that (Tcl has a bug, as I mentioned, though)
> but your other code may not.

I have a feeling that the answer to my segfault problems has been
sitting in front of my nose all along.  To recap, I have a vendor
library, of unknown thread-safe-ness, which I'm using in an AOLserver
loadable module.  Despite serializing all access to the vendor library
with a mutex, I still occasionaly get weird segfaults in _smalloc,
nasty Purify errors, and occasional other weird behavior.  All
symptoms point to heap corruption.

Turns out, *ALL* the Array Bounds Write, Free Memory Write and Read,
and other nasty Purify errors (aka, not just UMR) are from the "tzcpy"
function, which is called ultimately from localtime_u, localtime_r,
mktime, or strftime.

Zoran explained how the "clock format -gmt 1" thread-safety problem he
patched in Tcl was because of an un-serialized tzset() writing over
the TZ environment variable messing up the process env array.

Which sounds pretty similar to my problem...  I applied Zoran's patch
to my AOLserver Tcl, but of course my vendor library is not using ANY
of Tcl's mutex locking.

Clearly, anytime thread-safety is achieved by mutex locking a
non-thread-safe library (rather than using a thread-safe library to
begin with) that mutex must be used by ALL code using it.  I don't
have the source to my vendor library, but I DO have the source to all
of AOLserver and Tcl, so it should be able to achieve this.

I tried it.  In aolserver/tcl8.3.2/generic/tclClock.c I changed
removed the "static" from the declaration of clockMutex, like this:

  /* TCL_DECLARE_MUTEX(clockMutex) */
  Tcl_Mutex clockMutex;

That way, without the static the linker exports it and I can access
clockMutex from my own loadable module.  In my mymodule.h I added:

  typedef struct Tcl_Mutex_ *Tcl_Mutex;
  extern Tcl_Mutex clockMutex;

And then every single time I locked the mutex around my vendor
library, I also locked clockMutex.

Unfortunately, it didn't seem to help much.  After I run for a while I
still start getting plenty of FUM, FMR, FMW, ABR errors in Purify -
see examples below.

Thoughts?  Is my strategy flawed in general, or are there just other
mutexes I need to lock?

Does the TZ env array issue occur anywhere else in AOLserver or Tcl,
or are there any other non-thread-safe library calls I should bee
looking out for?

(If I had a good list of all non-thread safe library calls, maybe I
could run a truss on everything to everywhere they're being used...)


Examples of Purify errors:
--------------------------
Here I am locking clockMutex around all access to the closed-source
vendor library:

AOLserver code:
FUM: Freeing unallocated memory (3 times):
  * This is occurring while in thread 14:
        free           [rtlib.o]
        tzcpy          [time_comm.c]
        getzname       [time_comm.c]
        _ltzset_u      [time_comm.c]
        localtime_u    [time_comm.c]
        localtime_r    [libc.so.1]
        ns_localtime   [reentrant.c:140]
        Ns_LogTime2    [log.c:212]
        Log            [log.c:405]
        ns_serverLog   [log.c:144]
        Ns_Log         [log.c:153]
        TclLog         [log.c:352]

AOLserver code:
FMR: Free memory read:
  * This is occurring while in thread 10:
        strlen         [rtlib.o]
        tzcpy          [time_comm.c]
        getzname       [time_comm.c]
        _ltzset_u      [time_comm.c]
        mktime         [libc.so.1]
        strftime       [libc.so.1]
        Ns_LogTime2    [log.c:213]

Vendor code:
FMW: Free memory write (280 times):
  * This is occurring while in thread 9:
        strncpy        [rtlib.o]
        tzcpy          [time_comm.c]
        _ltzset_u      [time_comm.c]
        localtime_u    [time_comm.c]

Vendor code:
FMW: Free memory write (280 times):
  * This is occurring while in thread 9:
        tzcpy          [time_comm.c]
        _ltzset_u      [time_comm.c]
        localtime_u    [time_comm.c]

AOLserver code:
ABR: Array bounds read:
  * This is occurring while in thread 9:
        strlen         [rtlib.o]
        tzcpy          [time_comm.c]
        getzname       [time_comm.c]
        _ltzset_u      [time_comm.c]
        localtime_u    [time_comm.c]
        localtime_r    [libc.so.1]
        ns_localtime   [reentrant.c:140]
        Ns_LogTime2    [log.c:212]

--
Andrew Piskorski <[EMAIL PROTECTED]>
http://www.piskorski.com

Reply via email to