Dear all,

This problem is now fixed on bitbucket. the problem occurred,
when one thread frees a nsv-array, but an internal representation
of an Tcl_Obj for this array in another thread still contained a
pointer to the (freed) array. It seems that whole nsv-arrays
are not to often freed in applications.

The bug was introduced many years ago, when starting to use
TclSetOpaqueObj() for Array structures. This did not hurt very
long time, since the arrays were never freed - causing
a memory leak. The problem became virulent by a change of me
fixing this memory leak of unset nsv-array structures in Nov 2014.

After the change we use now a less aggressive caching by
just storing the bucket pointer in the internal representation of
the Tcl_Obj. In order to get a full caching of the array as before,
the best thing would probably be the introduction of a new
tcl-obj type which uses an epoch on the bucket for the
validation of the array structure.

For the time being it is more important to get a robust version out.
There are two more fixes already committed on bitbucket, where
were flagged by the testing of Wolfgang Winkler (many thanks!),
so i think we  should treat 4.99.7 as a pre-release of 4.99.8,
which we could release next week or so.

all the best
-gustaf neumann



Am 02.03.15 um 14:06 schrieb David Osborne:
Thanks Gustaf.

I've over written the original core dump I sent to you, but this the equivalent info from a new core (this was a seg fault this time but appears to be at exactly the same location). The arrayObj is 0x2b45a40084e0 in this case.

Does any of this help further?

PS. this does not happen every time. It's intermittent. Maybe 50% of the times I run "make test"

(gdb) bt
#0  Ns_MutexLock (mutex=0x2b4500001004) at mutex.c:239
#1 0x00002b45989bf9ed in LockArrayObj (interp=interp@entry=0x2b45a4030ea0, arrayObj=0x2b45a40084e0, create=create@entry=0) at tclvar.c:1265 #2 0x00002b45989bef7e in NsTclNsvArrayObjCmd (UNUSED_clientData=<optimized out>, interp=0x2b45a4030ea0, objc=3, objv=0x2b45a40571b8) at tclvar.c:669 #3 0x00002b4599288e59 in TclEvalObjvInternal () from /usr/lib/x86_64-linux-gnu/libtcl8.5.so.0 #4 0x00002b45992cf95e in TclExecuteByteCode () from /usr/lib/x86_64-linux-gnu/libtcl8.5.so.0 #5 0x00002b4599312ce9 in TclObjInterpProcCore () from /usr/lib/x86_64-linux-gnu/libtcl8.5.so.0 #6 0x00002b4599288e59 in TclEvalObjvInternal () from /usr/lib/x86_64-linux-gnu/libtcl8.5.so.0 #7 0x00002b4599289b29 in TclEvalEx () from /usr/lib/x86_64-linux-gnu/libtcl8.5.so.0 #8 0x00002b4599289473 in Tcl_EvalEx () from /usr/lib/x86_64-linux-gnu/libtcl8.5.so.0 #9 0x00002b45989ac1cb in Ns_TclEval (dsPtr=dsPtr@entry=0x0, server=<optimized out>, script=script@entry=0x2b459c382d74 "\n # If necessary due to running this code in a different environment, you\n # can have the newly spawned worker thread first source this file here.\n tst_cond_worker\n ") at tclinit.c:334 #10 0x00002b45989bd073 in NsTclThread (arg=0x2b459c382d60) at tclthread.c:834 #11 0x00002b459905184c in NsThreadMain (arg=<optimized out>) at thread.c:227 #12 0x00002b4599052839 in ThreadMain (arg=<optimized out>) at pthread.c:809 #13 0x00002b459a04f0a4 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #14 0x00002b4599b7fccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

(gdb) frame 1
#1 0x00002b45989bf9ed in LockArrayObj (interp=interp@entry=0x2b45a4030ea0, arrayObj=0x2b45a40084e0, create=create@entry=0) at tclvar.c:1265
1265  Ns_MutexLock(&(arrayPtr->bucketPtr->lock));

(gdb) list
1260        assert(interp != NULL);
1261        assert(arrayObj != NULL);
1262
1263 if (likely(Ns_TclGetOpaqueFromObj(arrayObj, arrayType, (void **) &arrayPtr) == TCL_OK)
1264            && arrayPtr->bucketPtr != NULL) {
1265  Ns_MutexLock(&(arrayPtr->bucketPtr->lock));
1266            arrayPtr->locks++;
1267        } else {
1268            NsInterp *itPtr = NsGetInterpData(interp);
1269

(gdb) print *arrayPtr
$41 = {bucketPtr = 0x2b4500001004, entryPtr = 0x2b459c320010, vars = {buckets = 0x1, staticBuckets = {[0] = 0x0, [1] = 0x2b45a4023210, [2] = 0x3557e6, [3] = 0x2b459c2190f0}, numBuckets = -1674442720, numEntries = 11077, rebuildSize = -1725047104, downShift = 11077, mask = -1725047072, keyType = 11077, findProc = 0x2b459957f5c0 <tclVarHashKeyType>, createProc = 0, typePtr = 0x6c757365722d207d}, locks = 13545984096477300}
(gdb) print arrayObj
$42 = (Tcl_Obj *) 0x2b45a40084e0

(gdb) print *arrayObj
$43 = {refCount = 3, bytes = 0x2b459c392dd0 "ct1_work_queue", length = 14, typePtr = 0x2b4598bf5ac0, internalRep = {longValue = 47577913202687, doubleValue = 2.3506612414264323e-310, otherValuePtr = 0x2b45989d97ff, wideValue = 47577913202687, twoPtrValue = {ptr1 = 0x2b45989d97ff, ptr2 = 0x2b459c2190f0}, ptrAndLongRep = {ptr = 0x2b45989d97ff, value = 47577972183280}}}

(gdb) x/s 0x2b45989d97ff
0x2b45989d97ff:  "nsv:array"

(gdb) print *arrayObj->typePtr
$46 = {name = 0x2b45989d7f03 "ns:addr", freeIntRepProc = 0, dupIntRepProc = 0, updateStringProc = 0x2b45989b3930 <UpdateStringOfAddr>,
  setFromAnyProc = 0x2b45989b39c0 <SetAddrFromAny>}







On 27 February 2015 at 20:48, Gustaf Neumann <neum...@wu.ac.at <mailto:neum...@wu.ac.at>> wrote:

    Hi David,

    this is certainly not as expected, but i can't reproduce it.
    Does this happen on every run?

    It would be interesting to see the content of

       arrayObj=0xa05a8b0

    in frame #1 where the bytes should be the name of the array in
    question,
    the type should be a "ns:addr", ptr1 should be "nsv:array" and ptr2
    the arrayPtr. It is also interesting to see the pontents of arrayPtr.

    i've built and tested the server on various unix systems, the closest
    was probably an ubunu 12.04.

    -g


    Am 27.02.15 um 17:52 schrieb David Osborne:
    Hi,

    We're looking at doing a build of Naviserver tagged as 4.99.7.
    But we seem to be hitting an intermittent seg fault or bus error
    during the "make test".

    Sometimes the tests complete cleanly.
    It's often while running ns_conn.test, usually it PASSED
    ns_conn-1.2 then crashes.
    This is on a Debian 7.8 server.

    Built by doing:
    ./autogen.sh  --with-tcl=/usr/lib/tcl8.5
    ./configure --with-tcl=/usr/lib/tcl8.5 --prefix=/usr/local/ns
    --enable-symbols  --enable-threads
    make
    make test

    I have core dumps. There's a backtrace at the end of the bus
    error (but gdb isn't my area so apologies if there's nothing
    useful in there).

    Is this anything of concern?

-- David
    Qcode Software Limited
    http://www.qcode.co.uk


    Using host libthread_db library
    "/lib/x86_64-linux-gnu/libthread_db.so.1".
    Core was generated by `./nsd/nsd -u root -c -d -t
    /root/naviserver/tests/test.nscfg /root/naviserver/t'.
    Program terminated with signal 7, Bus error.
    #0  Ns_MutexLock (mutex=0x206c617665757274) at mutex.c:239
    239         mutexPtr = GETMUTEX(mutex);
    (gdb) info threads
      Id   Target Id         Frame
      18   Thread 0x2b8bd7b5f700 (LWP 7173) 0x00002b8bd685b5f2 in
    Tcl_ExternalToUtfDString () from /usr/lib/libtcl8.5.so.0
      17   Thread 0x2b8bdc580700 (LWP 7192)
    pthread_cond_timedwait@@GLIBC_2.3.2
    <mailto:pthread_cond_timedwait@@GLIBC_2.3.2> ()
        at
    ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216
      16   Thread 0x2b8bdc37f700 (LWP 7189) 0x00002b8bd7075d13 in
    *__GI___poll (fds=<optimized out>, nfds=<optimized out>,
        timeout=timeout@entry=30000) at
    ../sysdeps/unix/sysv/linux/poll.c:87
      15   Thread 0x2b8bdc17e700 (LWP 7188) 0x00002b8bd7075d13 in
    *__GI___poll (fds=<optimized out>, nfds=<optimized out>,
        timeout=timeout@entry=30000) at
    ../sysdeps/unix/sysv/linux/poll.c:87
      14   Thread 0x2b8bdbf7d700 (LWP 7187) 0x00002b8bd7075d13 in
    *__GI___poll (fds=<optimized out>, nfds=<optimized out>,
        timeout=timeout@entry=30000) at
    ../sysdeps/unix/sysv/linux/poll.c:87
      13   Thread 0x2b8bdbd7c700 (LWP 7186) 0x00002b8bd7075d13 in
    *__GI___poll (fds=<optimized out>, nfds=<optimized out>,
        timeout=timeout@entry=30000) at
    ../sysdeps/unix/sysv/linux/poll.c:87
      12   Thread 0x2b8bdbb7b700 (LWP 7185) 0x00002b8bd7075d13 in
    *__GI___poll (fds=<optimized out>, nfds=<optimized out>,
        timeout=timeout@entry=30000) at
    ../sysdeps/unix/sysv/linux/poll.c:87
      11   Thread 0x2b8bdb97a700 (LWP 7184) 0x00002b8bd7075d13 in
    *__GI___poll (fds=<optimized out>, nfds=<optimized out>,
        timeout=timeout@entry=30000) at
    ../sysdeps/unix/sysv/linux/poll.c:87
      10   Thread 0x2b8bdb779700 (LWP 7183) 0x00002b8bd7075d13 in
    *__GI___poll (fds=<optimized out>, nfds=<optimized out>,
        timeout=timeout@entry=10000) at
    ../sysdeps/unix/sysv/linux/poll.c:87
      9    Thread 0x2b8bdb578700 (LWP 7182)
    pthread_cond_timedwait@@GLIBC_2.3.2
    <mailto:pthread_cond_timedwait@@GLIBC_2.3.2> ()
        at
    ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216
      8    Thread 0x2b8bdb377700 (LWP 7181)
    pthread_cond_timedwait@@GLIBC_2.3.2
    <mailto:pthread_cond_timedwait@@GLIBC_2.3.2> ()
        at
    ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216
      7    Thread 0x2b8bdb176700 (LWP 7180)
    pthread_cond_timedwait@@GLIBC_2.3.2
    <mailto:pthread_cond_timedwait@@GLIBC_2.3.2> ()
        at
    ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216
      6    Thread 0x2b8bdaf75700 (LWP 7179)
    pthread_cond_timedwait@@GLIBC_2.3.2
    <mailto:pthread_cond_timedwait@@GLIBC_2.3.2> ()
        at
    ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216
      5    Thread 0x2b8bdad74700 (LWP 7178)
    pthread_cond_timedwait@@GLIBC_2.3.2
    <mailto:pthread_cond_timedwait@@GLIBC_2.3.2> ()
        at
    ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216
      4    Thread 0x2b8bda567700 (LWP 7177)
    pthread_cond_timedwait@@GLIBC_2.3.2
    <mailto:pthread_cond_timedwait@@GLIBC_2.3.2> ()
        at
    ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216
      3    Thread 0x2b8bd7ed7700 (LWP 7174) 0x00002b8bd707a453 in
    select () at ../sysdeps/unix/syscall-template.S:82
      2    Thread 0x2b8bd77522e0 (LWP 7172) do_sigwait
    (set=0x7fff74ac2f50, sig=0x7fff74ac2f4c)
        at
    
../nptl/sysdeps/unix/sysv/linux/../../../../../sysdeps/unix/sysv/linux/sigwait.c:65
    * 1    Thread 0x2b8bdcba2700 (LWP 7196) Ns_MutexLock
    (mutex=0x206c617665757274) at mutex.c:239
    (gdb) bt
    #0  Ns_MutexLock (mutex=0x206c617665757274) at mutex.c:239
    #1  0x00002b8bd5f569ed in LockArrayObj
    (interp=interp@entry=0xa0022a0, arrayObj=0xa05a8b0,
    create=create@entry=0) at tclvar.c:1265
    #2  0x00002b8bd5f55f7e in NsTclNsvArrayObjCmd
    (UNUSED_clientData=<optimized out>, interp=0xa0022a0, objc=3,
    objv=0xa060c38)
        at tclvar.c:669
    #3  0x00002b8bd681fdbe in ?? () from /usr/lib/libtcl8.5.so.0
    #4  0x00002b8bd68624be in ?? () from /usr/lib/libtcl8.5.so.0
    #5  0x00002b8bd68a427b in TclObjInterpProcCore () from
    /usr/lib/libtcl8.5.so.0
    #6  0x00002b8bd681fdbe in ?? () from /usr/lib/libtcl8.5.so.0
    #7  0x00002b8bd68209f5 in ?? () from /usr/lib/libtcl8.5.so.0
    #8  0x00002b8bd6820546 in Tcl_EvalEx () from /usr/lib/libtcl8.5.so.0
    #9  0x00002b8bd5f431cb in Ns_TclEval (dsPtr=dsPtr@entry=0x0,
    server=<optimized out>,
        script=script@entry=0x9fb2884 "\n        # If necessary due
    to running this code in a different environment, you\n        #
    can have the newly spawned worker thread first source this file
    here.\n  tst_cond_worker\n    ") at tclinit.c:334
    #10 0x00002b8bd5f54073 in NsTclThread (arg=0x9fb2870) at
    tclthread.c:834
    #11 0x00002b8bd65e884c in NsThreadMain (arg=<optimized out>) at
    thread.c:227
    #12 0x00002b8bd65e9839 in ThreadMain (arg=<optimized out>) at
    pthread.c:809
    #13 0x00002b8bd753bb50 in start_thread (arg=<optimized out>) at
    pthread_create.c:304
    #14 0x00002b8bd708095d in clone () at
    ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
    #15 0x0000000000000000 in ?? ()
    (gdb) frame 15
    #15 0x0000000000000000 in ?? ()
    (gdb) frame 14
    #14 0x00002b8bd708095d in clone () at
    ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
    112     in ../sysdeps/unix/sysv/linux/x86_64/clone.S
    (gdb) frame 13
    #13 0x00002b8bd753bb50 in start_thread (arg=<optimized out>) at
    pthread_create.c:304
    304     pthread_create.c: No such file or directory.
    (gdb) frame 12
    #12 0x00002b8bd65e9839 in ThreadMain (arg=<optimized out>) at
    pthread.c:809
    809         NsThreadMain(arg);
    (gdb) frame 11
    #11 0x00002b8bd65e884c in NsThreadMain (arg=<optimized out>) at
    thread.c:227
    227         (*thrPtr->proc) (thrPtr->arg);
    (gdb) frame 10
    #10 0x00002b8bd5f54073 in NsTclThread (arg=0x9fb2870) at
    tclthread.c:834
    834         (void) Ns_TclEval(dsPtr, argPtr->server, argPtr->script);
    (gdb) frame 9
    #9  0x00002b8bd5f431cb in Ns_TclEval (dsPtr=dsPtr@entry=0x0,
    server=<optimized out>,
        script=script@entry=0x9fb2884 "\n        # If necessary due
    to running this code in a different environment, you\n        #
    can have the newly spawned worker thread first source this file
    here.\n  tst_cond_worker\n    ") at tclinit.c:334
    334             if (Tcl_EvalEx(interp, script, -1, 0) != TCL_OK) {
    (gdb) frame 8
    #8  0x00002b8bd6820546 in Tcl_EvalEx () from /usr/lib/libtcl8.5.so.0
    (gdb) frame 7
    #7  0x00002b8bd68209f5 in ?? () from /usr/lib/libtcl8.5.so.0
    (gdb) frame 6
    #6  0x00002b8bd681fdbe in ?? () from /usr/lib/libtcl8.5.so.0
    (gdb) frame 5
    #5  0x00002b8bd68a427b in TclObjInterpProcCore () from
    /usr/lib/libtcl8.5.so.0
    (gdb) frame 4
    #4  0x00002b8bd68624be in ?? () from /usr/lib/libtcl8.5.so.0
    (gdb) frame 3
    #3  0x00002b8bd681fdbe in ?? () from /usr/lib/libtcl8.5.so.0
    (gdb) frame 2
    #2  0x00002b8bd5f55f7e in NsTclNsvArrayObjCmd
    (UNUSED_clientData=<optimized out>, interp=0xa0022a0, objc=3,
    objv=0xa060c38)
        at tclvar.c:669
    669             arrayPtr = LockArrayObj(interp, objv[2], 0);
    (gdb) frame 1
    #1  0x00002b8bd5f569ed in LockArrayObj
    (interp=interp@entry=0xa0022a0, arrayObj=0xa05a8b0,
    create=create@entry=0) at tclvar.c:1265
    1265  Ns_MutexLock(&(arrayPtr->bucketPtr->lock));
    (gdb) frame 0
    #0  Ns_MutexLock (mutex=0x206c617665757274) at mutex.c:239
    239         mutexPtr = GETMUTEX(mutex);
    (gdb) list
    234 Ns_GetTime(&startTime);
    235     #endif
    236
    237         assert(mutex != NULL);
    238
    239         mutexPtr = GETMUTEX(mutex);
    240         if (unlikely(!NsLockTry(mutexPtr->lock))) {
    241 NsLockSet(mutexPtr->lock);
    242 ++mutexPtr->nbusy;
    243
    (gdb)




------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel

Reply via email to