Hi Gustaf many thanks for that. Our version of OpenACS didn't have the tracing feature, but I was able to add it.
We don't issue any "ns_set cleanup" ourselves, but when I added your recommendation of tracing "ns_set", I saw plenty in the logs. The docs say that "This command is autoamtically executed by ns_cleanup, which runs after every request, freeing all sets created via ns_set", so I presume that is where the calls come from. Is there something in particular I should be looking for in the trace output? There is an enormous amount of information the logs. I ran another backtrace and this time the error was a little bit different ("error: Cannot access memory at address 0x2" - see below). I will continue to try and find out which ns_set is the source of the issue (but there may be multiple). Do you have any theory what change in 4.99.25 caused this to stop working? thanks Brian [16/Aug/2023:14:05:13][13608.7ffff4b6f640][-conn:gustaf:default:0:1-] Notice: trace: ns_set iget t7 content-type [16/Aug/2023:14:05:13][13608.7ffff4b6f640][-conn:gustaf:default:0:1-] Notice: trace: ns_set iget t7 content-type [16/Aug/2023:14:05:13][13608.7ffff4b6f640][-conn:gustaf:default:0:1-] Notice: trace: ns_set ifind t7 cache-control [16/Aug/2023:14:05:13][13608.7ffff4b6f640][-conn:gustaf:default:0:1-] Notice: trace: ns_set ifind t7 expires [16/Aug/2023:14:05:13][13608.7ffff4b6f640][-conn:gustaf:default:0:1-] Notice: trace: ns_set size t7 [16/Aug/2023:14:05:13][13608.7ffff4b6f640][-conn:gustaf:default:0:1-] Notice: trace: ns_set key t7 0 [16/Aug/2023:14:05:13][13608.7ffff4b6f640][-conn:gustaf:default:0:1-] Notice: trace: ns_set size t7 [16/Aug/2023:14:05:13][13608.7ffff4b6f640][-conn:gustaf:default:0:1-] Notice: trace: ns_set update t7 Expires Wed, 16 Aug 2023 13:05:13 GMT [16/Aug/2023:14:05:13][13608.7ffff4b6f640][-conn:gustaf:default:0:1-] Notice: trace: ns_set put t7 Pragma no-cache [16/Aug/2023:14:05:13][13608.7ffff4b6f640][-conn:gustaf:default:0:1-] Notice: trace: ns_set put t7 Cache-Control no-cache [16/Aug/2023:14:05:13][13608.7ffff4b6f640][-conn:gustaf:default:0:1-] Notice: trace: ns_set cleanup free(): invalid next size (fast) Thread 4 "nsd" received signal SIGABRT, Aborted. [Switching to Thread 0x7ffff4b6f640 (LWP 13801)] __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737299019328) at ./nptl/pthread_kill.c:44 44 ./nptl/pthread_kill.c: No such file or directory. (gdb) backtrace #0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737299019328) at ./nptl/pthread_kill.c:44 #1 __pthread_kill_internal (signo=6, threadid=140737299019328) at ./nptl/pthread_kill.c:78 #2 __GI___pthread_kill (threadid=140737299019328, signo=signo@entry=6) at ./nptl/pthread_kill.c:89 #3 0x00007ffff7c7d476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #4 0x00007ffff7c637f3 in __GI_abort () at ./stdlib/abort.c:79 #5 0x00007ffff7cc46f6 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff7e16b8c "%s\n") at ../sysdeps/posix/libc_fatal.c:155 #6 0x00007ffff7cdbd7c in malloc_printerr (str=str@entry=0x7ffff7e19740 "free(): invalid next size (fast)") at ./malloc/malloc.c:5664 #7 0x00007ffff7cddb1d in _int_free (av=0x7fffdc000030, p=0x7fffdd865320, have_lock=0) at ./malloc/malloc.c:4522 #8 0x00007ffff7ce04d3 in __GI___libc_free (mem=<optimized out>) at ./malloc/malloc.c:3391 #9 0x00007ffff7bdb1e5 in ns_free (ptr=0x7fffdd865330) at memory.c:94 #10 0x00007ffff7f09b64 in Ns_SetFree (set=0x7fffdde381a0) at set.c:397 #11 0x00007ffff7f3e119 in NsTclSetObjCmd (clientData=0x7fffdc035570, interp=0x7fffdc005250, objc=2, objv=0x7fffdc245650) at tclset.c:330 #12 0x00007ffff79cb18e in Dispatch (data=0x7fffdc34afc8, interp=0x7fffdc005250, result=0) at /usr/local/src/tcl8.6.13/generic/tclBasic.c:4467 #13 0x00007ffff79cb21f in TclNRRunCallbacks (interp=0x7fffdc005250, result=0, rootPtr=0x0) at /usr/local/src/tcl8.6.13/generic/tclBasic.c:4503 #14 0x00007ffff79ca949 in Tcl_EvalObjv (interp=0x7fffdc005250, objc=1, objv=0x7fffdc2453f0, flags=2097168) at /usr/local/src/tcl8.6.13/generic/tclBasic.c:4226 #15 0x00007ffff79cd384 in TclEvalEx (interp=0x7fffdc005250, script=0x7ffff4b6e880 "ns_cleanup", numBytes=10, flags=0, line=1, clNextOuter=0x0, outerScript=0x7ffff4b6e880 "ns_cleanup") at /usr/local/src/tcl8.6.13/generic/tclBasic.c:5372 #16 0x00007ffff79cc5d9 in Tcl_EvalEx (interp=0x7fffdc005250, script=0x7ffff4b6e880 "ns_cleanup", numBytes=10, flags=0) at /usr/local/src/tcl8.6.13/generic/tclBasic.c:5037 #17 0x00007ffff7f18c02 in Ns_TclEvalCallback (interp=0x7fffdc005250, cbPtr=0x5555556a49d0, resultDString=0x0) at tclcallbacks.c:186 #18 0x00007ffff7f29764 in NsTclTraceProc (interp=0x7fffdc005250, arg=0x5555556a49d0) at tclinit.c:1913 #19 0x00007ffff7f2a158 in RunTraces (itPtr=0x7fffdc035570, why=NS_TCL_TRACE_DEALLOCATE) at tclinit.c:2375 #20 0x00007ffff7f29976 in PushInterp (itPtr=0x7fffdc035570) at tclinit.c:2026 #21 0x00007ffff7f29717 in NsFreeConnInterp (connPtr=0x555555630950) at tclinit.c:1885 #22 0x00007ffff7efdf11 in ConnRun (connPtr=0x555555630950) at queue.c:2648 #23 0x00007ffff7efd0de in NsConnThread (arg=0x55555564bea0) at queue.c:2211 #24 0x00007ffff7bdd734 in NsThreadMain (arg=0x555558577960) at thread.c:232 #25 0x00007ffff7bdf6f5 in ThreadMain (arg=0x555558577960) at pthread.c:870 #26 0x00007ffff7ccfb43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442 #27 0x00007ffff7d61a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 (gdb) bt full #0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737299019328) at ./nptl/pthread_kill.c:44 tid = <optimized out> ret = 0 pd = 0x7ffff4b6f640 old_mask = {__val = {68118949824, 0, 140737349500896, 140737349117410, 140737299013472, 140736886756960, 140736887828416, 15, 140737299013552, 140737348535152, 140736886756976, 140736884396624, 140736886755600, 140736884396624, 140736884381136, 140736886756960}} ret = <optimized out> pd = <optimized out> old_mask = <optimized out> ret = <optimized out> tid = <optimized out> ret = <optimized out> resultvar = <optimized out> resultvar = <optimized out> __arg3 = <optimized out> __arg2 = <optimized out> __arg1 = <optimized out> _a3 = <optimized out> _a2 = <optimized out> _a1 = <optimized out> __futex = <optimized out> resultvar = <optimized out> __arg3 = <optimized out> __arg2 = <optimized out> __arg1 = <optimized out> _a3 = <optimized out> _a2 = <optimized out> _a1 = <optimized out> __futex = <optimized out> __private = <optimized out> __oldval = <optimized out> result = <optimized out> #1 __pthread_kill_internal (signo=6, threadid=140737299019328) at ./nptl/pthread_kill.c:78 No locals. #2 __GI___pthread_kill (threadid=140737299019328, signo=signo@entry=6) at ./nptl/pthread_kill.c:89 No locals. #3 0x00007ffff7c7d476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 ret = <optimized out> #4 0x00007ffff7c637f3 in __GI_abort () at ./stdlib/abort.c:79 save_stage = 1 act = {__sigaction_handler = {sa_handler = 0x7fffdc245a30, sa_sigaction = 0x7fffdc245a30}, sa_mask = {__val = {140736886757216, 140737299014196, 140736884396624, 140736886756976, 140736886757568, 140736886757664, 140736886757840, 140736886757936, 140736886756704, 2203128881648, 140737299013943, 18446744069414584324, 140737349538724, 140733193388042, 140737299014196, 18446744069414584328}}, sa_flags = -600526912, sa_restorer = 0x7ffff4b6e239} --Type <RET> for more, q to quit, c to continue without paging--c sigs = {__val = {32, 93825009984944, 0, 0, 3, 140733193388035, 85899345920, 12884901888, 0, 0, 8589934591, 140736886757664, 140736886757664, 140737299014201, 140736886757840, 140736886757936}} #5 0x00007ffff7cc46f6 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff7e16b8c "%s\n") at ../sysdeps/posix/libc_fatal.c:155 ap = {{gp_offset = 24, fp_offset = 3, overflow_arg_area = 0x7ffff4b6e240, reg_save_area = 0x7ffff4b6e1d0}} fd = <optimized out> list = <optimized out> nlist = <optimized out> cp = <optimized out> #6 0x00007ffff7cdbd7c in malloc_printerr (str=str@entry=0x7ffff7e19740 "free(): invalid next size (fast)") at ./malloc/malloc.c:5664 No locals. #7 0x00007ffff7cddb1d in _int_free (av=0x7fffdc000030, p=0x7fffdd865320, have_lock=0) at ./malloc/malloc.c:4522 fail = <optimized out> idx = <optimized out> old = <optimized out> old2 = <optimized out> size = 32 fb = <optimized out> nextchunk = <optimized out> nextsize = <optimized out> nextinuse = <optimized out> prevsize = <optimized out> bck = <optimized out> fwd = <optimized out> __PRETTY_FUNCTION__ = "_int_free" #8 0x00007ffff7ce04d3 in __GI___libc_free (mem=<optimized out>) at ./malloc/malloc.c:3391 ar_ptr = <optimized out> p = <optimized out> err = 25 #9 0x00007ffff7bdb1e5 in ns_free (ptr=0x7fffdd865330) at memory.c:94 No locals. #10 0x00007ffff7f09b64 in Ns_SetFree (set=0x7fffdde381a0) at set.c:397 i = 8 __PRETTY_FUNCTION__ = "Ns_SetFree" #11 0x00007ffff7f3e119 in NsTclSetObjCmd (clientData=0x7fffdc035570, interp=0x7fffdc005250, objc=2, objv=0x7fffdc245650) at tclset.c:330 key = 0x7fffdc3aa070 "d6" itPtr = 0x7fffdc035570 set = 0x7fffdde381a0 ds = {string = 0x2 <error: Cannot access memory at address 0x2>, length = -601598384, spaceAvl = 32767, staticSpace = "\300\344\266\364\377\177\000\000PV$\334\377\177\000\000\v\350b\336\002\000\000\000\200}\021\334\377\177\000\000H\345\266\364\377\177\000\000PR\000\334\377\177\000\000\000\345\266\364\377\177\000\000\025", '\000' <repeats 15 times>, "PR\000\334\377\177\000\000\220n\004\334\377\177\000\000\340\266\062\334\377\177\000\000@\207\236ZUU\000\000\000%9\210\226wv\035\300\345\266\364\377\177\000\000\230\256\234\367\377\177\000\000`\345\266\364\377\177\000\000p\203\252\367\000\000\000\000PR\000\334\377\177\000\000\021\334\377\177\000\000\000\017\000\334\377\177\000\000\000\000\000\000\020\000 \000\002\000\000\000\000\000\000\000\220n\004\334\377\177\000\000\000\000\000\000\000\000\000"} tablePtr = 0x7fffdc035740 hPtr = 0x7fffdc3aa050 search = {tablePtr = 0x7fffdc035740, nextIndex = 3, nextEntryPtr = 0x7fffda7c9790} opt = 1 result = 0 opts = {0x7ffff7f89745 "array", 0x7ffff7f8974b "cleanup", 0x7ffff7f89753 "copy", 0x7ffff7f89758 "cput", 0x7ffff7f8975d "create", 0x7ffff7f89764 "delete", 0x7ffff7f8976b "delkey", 0x7ffff7f89772 "find", 0x7ffff7f89777 "free", 0x7ffff7f8977c "get", 0x7ffff7f89780 "icput", 0x7ffff7f89786 "idelkey", 0x7ffff7f8978e "ifind", 0x7ffff7f89794 "iget", 0x7ffff7f89799 "imerge", 0x7ffff7f897a0 "isnull", 0x7ffff7f897a7 "iunique", 0x7ffff7f897af "iupdate", 0x7ffff7f897b7 "key", 0x7ffff7f897bb "keys", 0x7ffff7f897c0 "list", 0x7ffff7f897c5 "merge", 0x7ffff7f897cb "move", 0x7ffff7f897d0 "name", 0x7ffff7f897d5 "new", 0x7ffff7f897d9 "print", 0x7ffff7f897df "put", 0x7ffff7f897e3 "size", 0x7ffff7f897e8 "split", 0x7ffff7f897ee "truncate", 0x7ffff7f897f7 "unique", 0x7ffff7f897fe "update", 0x7ffff7f89805 "value", 0x7ffff7f8980b "values", 0x0} SArrayIdx = SArrayIdx SCleanupIdx = SCleanupIdx SCopyIdx = SCopyIdx SCPutIdx = SCPutIdx SCreateidx = SCreateidx SDeleteIdx = SDeleteIdx SDelkeyIdx = SDelkeyIdx SFindIdx = SFindIdx SFreeIdx = SFreeIdx SGetIdx = SGetIdx SICPutIdx = SICPutIdx SIDelkeyIdx = SIDelkeyIdx SIFindIdx = SIFindIdx SIGetIdx = SIGetIdx SIMergeIdx = SIMergeIdx SIsNullIdx = SIsNullIdx SIUniqueIdx = SIUniqueIdx SIUpdateIdx = SIUpdateIdx SKeyIdx = SKeyIdx SKeysIdx = SKeysIdx SListIdx = SListIdx SMergeIdx = SMergeIdx SMoveIdx = SMoveIdx sINameIdx = sINameIdx SNewIdx = SNewIdx SPrintIdx = SPrintIdx SPutIdx = SPutIdx SSizeIdx = SSizeIdx SSplitIdx = SSplitIdx STruncateIdx = STruncateIdx SUniqueIdx = SUniqueIdx SUpdateIdx = SUpdateIdx SValueIdx = SValueIdx SValuesIdx = SValuesIdx __PRETTY_FUNCTION__ = "NsTclSetObjCmd" #12 0x00007ffff79cb18e in Dispatch (data=0x7fffdc34afc8, interp=0x7fffdc005250, result=0) at /usr/local/src/tcl8.6.13/generic/tclBasic.c:4467 objProc = 0x7ffff7f3df2d <NsTclSetObjCmd> clientData = 0x7fffdc035570 objc = 2 objv = 0x7fffdc245650 iPtr = 0x7fffdc005250 #13 0x00007ffff79cb21f in TclNRRunCallbacks (interp=0x7fffdc005250, result=0, rootPtr=0x0) at /usr/local/src/tcl8.6.13/generic/tclBasic.c:4503 callbackPtr = 0x7fffdc34afc0 procPtr = 0x7ffff79cb10e <Dispatch> iPtr = 0x7fffdc005250 #14 0x00007ffff79ca949 in Tcl_EvalObjv (interp=0x7fffdc005250, objc=1, objv=0x7fffdc2453f0, flags=2097168) at /usr/local/src/tcl8.6.13/generic/tclBasic.c:4226 result = 0 rootPtr = 0x0 #15 0x00007ffff79cd384 in TclEvalEx (interp=0x7fffdc005250, script=0x7ffff4b6e880 "ns_cleanup", numBytes=10, flags=0, line=1, clNextOuter=0x0, outerScript=0x7ffff4b6e880 "ns_cleanup") at /usr/local/src/tcl8.6.13/generic/tclBasic.c:5372 wordLine = 1 wordCLNext = 0x0 objectsNeeded = 1 wordStart = 0x7ffff4b6e880 "ns_cleanup" numWords = 1 iPtr = 0x7fffdc005250 p = 0x7ffff4b6e880 "ns_cleanup" next = 0x1f4b6e820 <error: Cannot access memory at address 0x1f4b6e820> minObjs = 20 objv = 0x7fffdc2453f0 objvSpace = 0x7fffdc2453f0 expand = 0x7fffdc2454a0 lines = 0x7fffdc245500 lineSpace = 0x7fffdc245500 tokenPtr = 0x7fffdc2451d0 commandLength = 32767 bytesLeft = 10 expandRequested = 0 code = 0 savedVarFramePtr = 0x7fffdc001550 allowExceptions = 0 gotParse = 1 i = 4105627376 objectsUsed = 1 parsePtr = 0x7fffdc245140 eeFramePtr = 0x7fffdc245390 stackObjArray = 0x7fffdc2453f0 expandStack = 0x7fffdc2454a0 linesStack = 0x7fffdc245500 clNext = 0x0 #16 0x00007ffff79cc5d9 in Tcl_EvalEx (interp=0x7fffdc005250, script=0x7ffff4b6e880 "ns_cleanup", numBytes=10, flags=0) at /usr/local/src/tcl8.6.13/generic/tclBasic.c:5037 No locals. #17 0x00007ffff7f18c02 in Ns_TclEvalCallback (interp=0x7fffdc005250, cbPtr=0x5555556a49d0, resultDString=0x0) at tclcallbacks.c:186 arg = 0x0 ii = 0 ap = {{gp_offset = 32, fp_offset = 48, overflow_arg_area = 0x7ffff4b6ea10, reg_save_area = 0x7ffff4b6e950}} ds = {string = 0x7ffff4b6e880 "ns_cleanup", length = 10, spaceAvl = 200, staticSpace = "ns_cleanup\000\367\377\177\000\000\300\350\266\364\377\177\000\000P\351\266\364\377\177\000\000\210\060jUUU\000\000@\351\266\364\377\177\000\000\000\351\266\364\377\177\000\000\340\tcU\001\001\001\000\340\350\266\364\377\177\000\000\360\350\266\364\377\177\000\000\020\351\266\364\377\177\000\000\270\060jUUU\000\000\020\351\266\364\377\177\000\000\332\356\275\367\377\177\000\000\211\311\334d\000\000\000\000 1jUUU\000\000\000\000\000\000\000\000\000\000\270\060jU\005\000\000\000\220\351\266\364\377\177\000\000\023\275\275\367\377\177\000\000\060\352\266\364\377\177\000\000ยข\362\367\377\177\000\000\211\311\334d\000\000\000\000p\fJ\334\b\000\000\000\060JjUUU\000"} deallocInterp = false status = 1 __PRETTY_FUNCTION__ = "Ns_TclEvalCallback" #18 0x00007ffff7f29764 in NsTclTraceProc (interp=0x7fffdc005250, arg=0x5555556a49d0) at tclinit.c:1913 cbPtr = 0x5555556a49d0 result = 0 #19 0x00007ffff7f2a158 in RunTraces (itPtr=0x7fffdc035570, why=NS_TCL_TRACE_DEALLOCATE) at tclinit.c:2375 tracePtr = 0x5555556a4a30 servPtr = 0x55555562b470 __PRETTY_FUNCTION__ = "RunTraces" #20 0x00007ffff7f29976 in PushInterp (itPtr=0x7fffdc035570) at tclinit.c:2026 interp = 0x7fffdc005250 ok = true __PRETTY_FUNCTION__ = "PushInterp" #21 0x00007ffff7f29717 in NsFreeConnInterp (connPtr=0x555555630950) at tclinit.c:1885 itPtr = 0x7fffdc035570 #22 0x00007ffff7efdf11 in ConnRun (connPtr=0x555555630950) at queue.c:2648 sockPtr = 0x7fffd8001d80 conn = 0x555555630950 servPtr = 0x55555562b470 status = NS_OK auth = 0x0 __PRETTY_FUNCTION__ = "ConnRun" #23 0x00007ffff7efd0de in NsConnThread (arg=0x55555564bea0) at queue.c:2211 argPtr = 0x55555564bea0 poolPtr = 0x555555630660 servPtr = 0x55555562b470 connPtr = 0x555555630950 wait = {sec = 1692191231, usec = 653861} timePtr = 0x7ffff4b6ec20 threadId = 0 duringShutdown = 220 fromQueue = false cpt = 1000 ncons = 999 current = 2 status = NS_OK timeout = {sec = 120, usec = 0} exitMsg = 0x7fffdc000b70 "" joinThread = 0x7ffff4b6f640 threadsLockPtr = 0x5555556306d0 tqueueLockPtr = 0x555555630718 wqueueLockPtr = 0x5555556306a8 __PRETTY_FUNCTION__ = "NsConnThread" #24 0x00007ffff7bdd734 in NsThreadMain (arg=0x555558577960) at thread.c:232 thrPtr = 0x555558577960 #25 0x00007ffff7bdf6f5 in ThreadMain (arg=0x555558577960) at pthread.c:870 No locals. #26 0x00007ffff7ccfb43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442 ret = <optimized out> pd = <optimized out> out = <optimized out> unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140737488346768, 7820541724095505063, 140737299019328, 0, 140737350793296, 140737488347120, -7820557727598026073, -7820559187479928153}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = <optimized out> #27 0x00007ffff7d61a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 No locals. ________________________________ From: Gustaf Neumann <neum...@wu.ac.at> Sent: Tuesday 15 August 2023 6:16 pm To: naviserver-devel@lists.sourceforge.net <naviserver-devel@lists.sourceforge.net> Subject: Re: [naviserver-devel] Crashing on all versions >4.99.24 on Ubuntu Brian, many thanks, the backtrace gives some insights: The problem happens in a Ns_SetFree operation if set "d8" triggered by an "ns_set cleanup" during the cleanup of the request. Something is broken with this nsset. Can it be that your application package issues "ns_set cleanup" as well? Do you have a recent version of OpenACS? If so, you can turn on tracing of the "ns_set" command by adding it to the "traced_cmds". -g ================================================================================= --- acs-tcl/tcl/tcltrace-init.tcl 27 Nov 2020 09:52:06 -0000 1.4.2.4 +++ acs-tcl/tcl/tcltrace-init.tcl 15 Aug 2023 17:13:51 -0000 @@ -32,6 +32,7 @@ #set traced_cmds {::ns_setcookie ::ns_getcookie ::ns_deletecookie} #set traced_cmds {::ns_return ::ns_returnnotfound ::ns_returnfile ::ns_returnmoved} #set traced_cmds [lsort [info commands ::ns_return*]] +set traced_cmds {::ns_set} foreach cmd $traced_cmds { append trace "\ntrace add execution $cmd enter {::tcltrace::before}" } ================================================================================= On 15.08.23 15:51, Brian Fenton wrote: Hi I reproduced the problem using the install-ns.sh script running under gdb. Here's the output of backtrace and bt full. I'm new to using gdb so please let me know if you'd like to see some other info.
_______________________________________________ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel