Hi
I reproduced the problem using the install-ns.sh script running under gdb.
Here's the output of backtrace and bt full. I'm new to using gdb so please let
me know if you'd like to see some other info.
[15/Aug/2023:13:56:52][13147.7fffe35fe640][-driver:nsssl:0-] Notice: ...
sockAccept accepted 2 connections
free(): invalid next size (fast)
Thread 4 "nsd" received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffff4ad8640 (LWP 13651)]
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140737298400832) at
./nptl/pthread_kill.c:44
44 ./nptl/pthread_kill.c: No such file or directory.
(gdb) backtrace
#0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737016493632)
at ./nptl/pthread_kill.c:44
#1 __pthread_kill_internal (signo=6, threadid=140737016493632) at
./nptl/pthread_kill.c:78
#2 __GI___pthread_kill (threadid=140737016493632, signo=signo@entry=6) at
./nptl/pthread_kill.c:89
#3 0x00007ffff7c7d476 in __GI_raise (sig=sig@entry=6) at
../sysdeps/posix/raise.c:26
#4 0x00007ffff7c637f3 in __GI_abort () at ./stdlib/abort.c:79
#5 0x00007ffff7cc46f6 in __libc_message (action=action@entry=do_abort,
fmt=fmt@entry=0x7ffff7e16b8c "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#6 0x00007ffff7cdbd7c in malloc_printerr (str=str@entry=0x7ffff7e19230
"munmap_chunk(): invalid pointer") at ./malloc/malloc.c:5664
#7 0x00007ffff7cdc05c in munmap_chunk (p=<optimized out>) at
./malloc/malloc.c:3060
#8 0x00007ffff7ce051a in __GI___libc_free (mem=<optimized out>) at
./malloc/malloc.c:3381
#9 0x00007ffff7bdb1e5 in ns_free (ptr=0x7fffd4de0ba0) at memory.c:94
#10 0x00007ffff7f09b64 in Ns_SetFree (set=0x7fffd5886210) at set.c:397
#11 0x00007ffff7f3e119 in NsTclSetObjCmd (clientData=0x7fffd403d590,
interp=0x7fffd4005250, objc=2, objv=0x7fffd453a510) at tclset.c:330
#12 0x00007ffff79cb18e in Dispatch (data=0x7fffd410e3b8, interp=0x7fffd4005250,
result=0) at /usr/local/src/tcl8.6.13/generic/tclBasic.c:4467
#13 0x00007ffff79cb21f in TclNRRunCallbacks (interp=0x7fffd4005250, result=0,
rootPtr=0x0) at /usr/local/src/tcl8.6.13/generic/tclBasic.c:4503
#14 0x00007ffff79ca949 in Tcl_EvalObjv (interp=0x7fffd4005250, objc=1,
objv=0x7fffd453a2b0, flags=2097168) at
/usr/local/src/tcl8.6.13/generic/tclBasic.c:4226
#15 0x00007ffff79cd384 in TclEvalEx (interp=0x7fffd4005250,
script=0x7fffe3dfe880 "ns_cleanup", numBytes=10, flags=0, line=1,
clNextOuter=0x0,
outerScript=0x7fffe3dfe880 "ns_cleanup") at
/usr/local/src/tcl8.6.13/generic/tclBasic.c:5372
#16 0x00007ffff79cc5d9 in Tcl_EvalEx (interp=0x7fffd4005250,
script=0x7fffe3dfe880 "ns_cleanup", numBytes=10, flags=0) at
/usr/local/src/tcl8.6.13/generic/tclBasic.c:5037
#17 0x00007ffff7f18c02 in Ns_TclEvalCallback (interp=0x7fffd4005250,
cbPtr=0x5555556a1b30, resultDString=0x0) at tclcallbacks.c:186
#18 0x00007ffff7f29764 in NsTclTraceProc (interp=0x7fffd4005250,
arg=0x5555556a1b30) at tclinit.c:1913
#19 0x00007ffff7f2a158 in RunTraces (itPtr=0x7fffd403d590,
why=NS_TCL_TRACE_DEALLOCATE) at tclinit.c:2375
#20 0x00007ffff7f29976 in PushInterp (itPtr=0x7fffd403d590) at tclinit.c:2026
#21 0x00007ffff7f29717 in NsFreeConnInterp (connPtr=0x55555562ebd0) at
tclinit.c:1885
#22 0x00007ffff7efdf11 in ConnRun (connPtr=0x55555562ebd0) at queue.c:2648
#23 0x00007ffff7efd0de in NsConnThread (arg=0x555555649030) at queue.c:2211
#24 0x00007ffff7bdd734 in NsThreadMain (arg=0x55555855cdc0) at thread.c:232
#25 0x00007ffff7bdf6f5 in ThreadMain (arg=0x55555855cdc0) at pthread.c:870
#26 0x00007ffff7ccfb43 in start_thread (arg=<optimized out>) at
./nptl/pthread_create.c:442
#27 0x00007ffff7d61a00 in clone3 () at
../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
gdb) bt full
#0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737016493632)
at ./nptl/pthread_kill.c:44
tid = <optimized out>
ret = 0
pd = 0x7fffe3dff640
old_mask = {__val = {140737016487840, 140736755639472, 3823099840, 512,
140737016487920, 140737348535152, 140736755639488, 140736750178896,
140736755638224,
140733193388032, 140736750163408, 140736755639472, 140736755639432,
140736752725904, 93825010219632, 93825010219632}}
ret = <optimized out>
pd = <optimized out>
old_mask = <optimized out>
ret = <optimized out>
tid = <optimized out>
ret = <optimized out>
resultvar = <optimized out>
resultvar = <optimized out>
__arg3 = <optimized out>
__arg2 = <optimized out>
__arg1 = <optimized out>
_a3 = <optimized out>
_a2 = <optimized out>
_a1 = <optimized out>
__futex = <optimized out>
resultvar = <optimized out>
__arg3 = <optimized out>
__arg2 = <optimized out>
__arg1 = <optimized out>
_a3 = <optimized out>
_a2 = <optimized out>
_a1 = <optimized out>
__futex = <optimized out>
__private = <optimized out>
__oldval = <optimized out>
result = <optimized out>
#1 __pthread_kill_internal (signo=6, threadid=140737016493632) at
./nptl/pthread_kill.c:78
No locals.
#2 __GI___pthread_kill (threadid=140737016493632, signo=signo@entry=6) at
./nptl/pthread_kill.c:89
No locals.
#3 0x00007ffff7c7d476 in __GI_raise (sig=sig@entry=6) at
../sysdeps/posix/raise.c:26
ret = <optimized out>
#4 0x00007ffff7c637f3 in __GI_abort () at ./stdlib/abort.c:79
save_stage = 1
act = {__sigaction_handler = {sa_handler = 0x600000004, sa_sigaction =
0x600000004}, sa_mask = {__val = {140736789161264, 140733193388042,
140737347688968,
140737016488736, 279037356156, 18446744073709551615,
140736755314240, 140737016488272, 140733193388033, 140736792190256,
140736790551568, 0, 140736755639936,
93825035611088, 140736753589312, 140736756049120}}, sa_flags =
1487610384, sa_restorer = 0x1}
--Type <RET> for more, q to quit, c to continue without paging--
sigs = {__val = {32, 140737350793296, 140737488347040, 140737350862035,
93824993127520, 140736755639576, 8589934656, 93825010219632, 25769803776,
193273528320,
140737016488160, 140737349119905, 3823100240, 4294967296,
2202846355952, 3556773632}}
#5 0x00007ffff7cc46f6 in __libc_message (action=action@entry=do_abort,
fmt=fmt@entry=0x7ffff7e16b8c "%s\n") at ../sysdeps/posix/libc_fatal.c:155
ap = {{gp_offset = 24, fp_offset = 0, overflow_arg_area =
0x7fffe3dfe2a0, reg_save_area = 0x7fffe3dfe230}}
fd = <optimized out>
list = <optimized out>
nlist = <optimized out>
cp = <optimized out>
#6 0x00007ffff7cdbd7c in malloc_printerr (str=str@entry=0x7ffff7e19230
"munmap_chunk(): invalid pointer") at ./malloc/malloc.c:5664
No locals.
#7 0x00007ffff7cdc05c in munmap_chunk (p=<optimized out>) at
./malloc/malloc.c:3060
pagesize = <optimized out>
size = <optimized out>
__PRETTY_FUNCTION__ = "munmap_chunk"
mem = <optimized out>
block = <optimized out>
total_size = <optimized out>
#8 0x00007ffff7ce051a in __GI___libc_free (mem=<optimized out>) at
./malloc/malloc.c:3381
ar_ptr = <optimized out>
p = <optimized out>
err = 25
#9 0x00007ffff7bdb1e5 in ns_free (ptr=0x7fffd4de0ba0) at memory.c:94
No locals.
#10 0x00007ffff7f09b64 in Ns_SetFree (set=0x7fffd5886210) at set.c:397
i = 10
__PRETTY_FUNCTION__ = "Ns_SetFree"
#11 0x00007ffff7f3e119 in NsTclSetObjCmd (clientData=0x7fffd403d590,
interp=0x7fffd4005250, objc=2, objv=0x7fffd453a510) at tclset.c:330
key = 0x7fffd464eb50 "d8"
itPtr = 0x7fffd403d590
set = 0x7fffd5886210
ds = {string = 0x7fffd6650c80 "%", length = -738176432, spaceAvl =
32767,
staticSpace =
"\320\344\337\343\377\177\000\000\240\236t\336\377\177\000\000\260\356\004\324\377\177\000\000PZ\000\324\377\177\000\000\000\345\337\343\377\177\000\000\312Ĝ\367\377\177\000\000\360\357Y\324\377\177\000\000\000\000\000\000\000\000\000\000\200\fe\326\377\177\000\000PR\000\324\377\177\000\000\000\000\000\000\001\000\000\000PR\000\324\377\177\000\000PZ\000\324\377\177\000\000\260\356\004\324\377\177\000\000\300\345\337\343\377\177\000\000Э\234\367\377\177\000\000`\345\337\343\377\177\000\000p\203\252\367\000\000\000\000PR\000\324\377\177\000\000H\003Z\324\377\177\000\000\000\017\000\324\377\177\000\000\000\000\000\000\020\000
\000\002\000\000\000\377\177\000\000\260\356\004\324\377\177\000\000\000\000\000\000\000\000\000"}
tablePtr = 0x7fffd403d760
hPtr = 0x7fffd464eb30
search = {tablePtr = 0x7fffd403d760, nextIndex = 13, nextEntryPtr = 0x0}
opt = 1
result = 0
opts = {0x7ffff7f89745 "array", 0x7ffff7f8974b "cleanup",
0x7ffff7f89753 "copy", 0x7ffff7f89758 "cput", 0x7ffff7f8975d "create",
0x7ffff7f89764 "delete", 0x7ffff7f8976b "delkey", 0x7ffff7f89772 "find",
0x7ffff7f89777 "free", 0x7ffff7f8977c "get", 0x7ffff7f89780 "icput",
0x7ffff7f89786 "idelkey", 0x7ffff7f8978e "ifind", 0x7ffff7f89794 "iget",
0x7ffff7f89799 "imerge", 0x7ffff7f897a0 "isnull", 0x7ffff7f897a7 "iunique",
0x7ffff7f897af "iupdate", 0x7ffff7f897b7 "key", 0x7ffff7f897bb "keys",
0x7ffff7f897c0 "list", 0x7ffff7f897c5 "merge", 0x7ffff7f897cb "move",
0x7ffff7f897d0 "name", 0x7ffff7f897d5 "new", 0x7ffff7f897d9 "print",
0x7ffff7f897df "put", 0x7ffff7f897e3 "size", 0x7ffff7f897e8 "split",
0x7ffff7f897ee "truncate", 0x7ffff7f897f7 "unique", 0x7ffff7f897fe "update",
0x7ffff7f89805 "value", 0x7ffff7f8980b "values", 0x0}
SArrayIdx = SArrayIdx
SCleanupIdx = SCleanupIdx
SCopyIdx = SCopyIdx
SCPutIdx = SCPutIdx
SCreateidx = SCreateidx
SDeleteIdx = SDeleteIdx
SDelkeyIdx = SDelkeyIdx
SFindIdx = SFindIdx
SFreeIdx = SFreeIdx
SGetIdx = SGetIdx
SICPutIdx = SICPutIdx
SIDelkeyIdx = SIDelkeyIdx
SIFindIdx = SIFindIdx
SIGetIdx = SIGetIdx
SIMergeIdx = SIMergeIdx
SIsNullIdx = SIsNullIdx
SIUniqueIdx = SIUniqueIdx
SIUpdateIdx = SIUpdateIdx
SKeyIdx = SKeyIdx
SKeysIdx = SKeysIdx
SListIdx = SListIdx
SMergeIdx = SMergeIdx
SMoveIdx = SMoveIdx
sINameIdx = sINameIdx
SNewIdx = SNewIdx
SPrintIdx = SPrintIdx
SPutIdx = SPutIdx
SSizeIdx = SSizeIdx
SSplitIdx = SSplitIdx
STruncateIdx = STruncateIdx
SUniqueIdx = SUniqueIdx
SUpdateIdx = SUpdateIdx
SValueIdx = SValueIdx
SValuesIdx = SValuesIdx
__PRETTY_FUNCTION__ = "NsTclSetObjCmd"
#12 0x00007ffff79cb18e in Dispatch (data=0x7fffd410e3b8, interp=0x7fffd4005250,
result=0) at /usr/local/src/tcl8.6.13/generic/tclBasic.c:4467
objProc = 0x7ffff7f3df2d <NsTclSetObjCmd>
clientData = 0x7fffd403d590
objc = 2
objv = 0x7fffd453a510
iPtr = 0x7fffd4005250
#13 0x00007ffff79cb21f in TclNRRunCallbacks (interp=0x7fffd4005250, result=0,
rootPtr=0x0) at /usr/local/src/tcl8.6.13/generic/tclBasic.c:4503
callbackPtr = 0x7fffd410e3b0
procPtr = 0x7ffff79cb10e <Dispatch>
iPtr = 0x7fffd4005250
#14 0x00007ffff79ca949 in Tcl_EvalObjv (interp=0x7fffd4005250, objc=1,
objv=0x7fffd453a2b0, flags=2097168) at
/usr/local/src/tcl8.6.13/generic/tclBasic.c:4226
result = 0
rootPtr = 0x0
#15 0x00007ffff79cd384 in TclEvalEx (interp=0x7fffd4005250,
script=0x7fffe3dfe880 "ns_cleanup", numBytes=10, flags=0, line=1,
clNextOuter=0x0, outerScript=0x7fffe3dfe880 "ns_cleanup") at
/usr/local/src/tcl8.6.13/generic/tclBasic.c:5372
wordLine = 1
wordCLNext = 0x0
objectsNeeded = 1
wordStart = 0x7fffe3dfe880 "ns_cleanup"
numWords = 1
iPtr = 0x7fffd4005250
p = 0x7fffe3dfe880 "ns_cleanup"
next = 0x1e3dfe820 <error: Cannot access memory at address 0x1e3dfe820>
minObjs = 20
objv = 0x7fffd453a2b0
objvSpace = 0x7fffd453a2b0
expand = 0x7fffd453a360
lines = 0x7fffd453a3c0
lineSpace = 0x7fffd453a3c0
tokenPtr = 0x7fffd453a090
commandLength = 32767
bytesLeft = 10
expandRequested = 0
code = 0
savedVarFramePtr = 0x7fffd4001550
allowExceptions = 0
gotParse = 1
i = 3823101680
objectsUsed = 1
parsePtr = 0x7fffd453a000
eeFramePtr = 0x7fffd453a250
stackObjArray = 0x7fffd453a2b0
expandStack = 0x7fffd453a360
linesStack = 0x7fffd453a3c0
clNext = 0x0
#16 0x00007ffff79cc5d9 in Tcl_EvalEx (interp=0x7fffd4005250,
script=0x7fffe3dfe880 "ns_cleanup", numBytes=10, flags=0) at
/usr/local/src/tcl8.6.13/generic/tclBasic.c:5037
No locals.
#17 0x00007ffff7f18c02 in Ns_TclEvalCallback (interp=0x7fffd4005250,
cbPtr=0x5555556a1b30, resultDString=0x0) at tclcallbacks.c:186
arg = 0x0
ii = 0
ap = {{gp_offset = 32, fp_offset = 48, overflow_arg_area =
0x7fffe3dfea10, reg_save_area = 0x7fffe3dfe950}}
ds = {string = 0x7fffe3dfe880 "ns_cleanup", length = 10, spaceAvl =
200, staticSpace =
"ns_cleanup\000\367\377\177\000\000\300\350\337\343\377\177\000\000P\351\337\343\377\177\000\000\210\277jUUU\000\000@\351\337\343\377\177\000\000\000\351\337\343\377\177\000\000`\354bU\001\001\001\000\340\350\337\343\377\177\000\000\360\350\337\343\377\177\000\000\020\351\337\343\377\177\000\000\270\277jUUU\000\000\020\351\337\343\377\177\000\000\332\356\275\367\377\177\000\000\223z\333d\000\000\000\000
\300jUUU\000\000\000\000\000\000\000\000\000\000\270\277jU\005\000\000\000\220\351\337\343\377\177\000\000\023\275\275\367\377\177\000\000\060\352\337\343\377\177\000\000¢\362\367\377\177\000\000\223z\333d\000\000\000\000P\340\025\324\b\000\000\000\220\033jUUU\000"}
deallocInterp = false
status = 1
__PRETTY_FUNCTION__ = "Ns_TclEvalCallback"
#18 0x00007ffff7f29764 in NsTclTraceProc (interp=0x7fffd4005250,
arg=0x5555556a1b30) at tclinit.c:1913
cbPtr = 0x5555556a1b30
result = 0
#19 0x00007ffff7f2a158 in RunTraces (itPtr=0x7fffd403d590,
why=NS_TCL_TRACE_DEALLOCATE) at tclinit.c:2375
tracePtr = 0x5555556a1b90
servPtr = 0x555555628560
__PRETTY_FUNCTION__ = "RunTraces"
#20 0x00007ffff7f29976 in PushInterp (itPtr=0x7fffd403d590) at tclinit.c:2026
interp = 0x7fffd4005250
ok = true
__PRETTY_FUNCTION__ = "PushInterp"
#21 0x00007ffff7f29717 in NsFreeConnInterp (connPtr=0x55555562ebd0) at
tclinit.c:1885
itPtr = 0x7fffd403d590
#22 0x00007ffff7efdf11 in ConnRun (connPtr=0x55555562ebd0) at queue.c:2648
sockPtr = 0x7fffd98f68a0
conn = 0x55555562ebd0
servPtr = 0x555555628560
status = NS_OK
auth = 0x0
__PRETTY_FUNCTION__ = "ConnRun"
#23 0x00007ffff7efd0de in NsConnThread (arg=0x555555649030) at queue.c:2211
argPtr = 0x555555649030
poolPtr = 0x55555562d7c0
servPtr = 0x555555628560
connPtr = 0x55555562ebd0
wait = {sec = 1692105481, usec = 312006}
timePtr = 0x7fffe3dfec20
threadId = 1
duringShutdown = 219
fromQueue = true
cpt = 1000
ncons = 996
current = 2
status = NS_OK
timeout = {sec = 120, usec = 0}
exitMsg = 0x7fffd4000b70 ""
joinThread = 0x7fffe3dff640
threadsLockPtr = 0x55555562d830
tqueueLockPtr = 0x55555562d878
wqueueLockPtr = 0x55555562d808
__PRETTY_FUNCTION__ = "NsConnThread"
#24 0x00007ffff7bdd734 in NsThreadMain (arg=0x55555855cdc0) at thread.c:232
thrPtr = 0x55555855cdc0
#25 0x00007ffff7bdf6f5 in ThreadMain (arg=0x55555855cdc0) at pthread.c:870
No locals.
#26 0x00007ffff7ccfb43 in start_thread (arg=<optimized out>) at
./nptl/pthread_create.c:442
ret = <optimized out>
pd = <optimized out>
out = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140737488346688,
-3886469656811452993, 140737016493632, 0, 140737350793296, 140737488347040,
3886531503754790335, 3886487635365545407}, mask_was_saved = 0}}, priv = {pad =
{0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = <optimized out>
#27 0x00007ffff7d61a00 in clone3 () at
../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
No locals.
thanks
Brian
________________________________
From: Brian Fenton <[email protected]>
Sent: Monday 14 August 2023 5:40 pm
To: [email protected]
<[email protected]>
Subject: Re: [naviserver-devel] Crashing on all versions >4.99.24 on Ubuntu
Hi Gustaf
thanks again for the advice. Today I made some more progress on this. There
does appear to be some differences between your script and the Oupfiz5
installer e.g. his ns-build.sh script
https://github.com/oupfiz5/tcl-build/blob/master/src/builds/ns-build.sh I have
reached the conclusion that I will be wasting your time if I can't reproduce
this problem using your scripts, so my next task will be to run your script and
try to reproduce. I am now seeing the downsides to using a non-official Docker
approach!
Today I took the approach of installing (through the APM) our OpenACS packages
one by one. For example, we use packages such as Categories, General Comments
etc as well as many of our own custom packages. After each package I bounced
Naviserver and tested the site. The system worked perfectly until after I
installed the last package, which is our main core of our product, very large
and old with a lot of features. This makes me very confident that Oracle and
nsoracle are working fine. The problem could be some API call in our custom
package that maybe changed in 4.99.25.
To answer some of your questions:
* did you run at this state any Oracle queries? Yes, I did. I'm 95%
confident that Oracle and nsoracle are working fine.
* did you recompile in the "clean install" also the oracle driver? Yes, I'm
building nsoracle from scratch (I am also running the same version of nsoracle
in the 4.99.24 build that is working without issue)
* you mean the crash happens in the plain openacs-config.tcl, with no
additional drivers etc, no oracle involved? No, this does use Oracle, sorry for
not being clear. We have our own heavily modified config file, so I wanted to
rule that out by using the openacs-config.tcl that you provide. I just changed
the database to Oracle and left everything else as is. The fact that it crashed
too means that I can eliminate some strange configuration setting in our custom
config file as a possible cause.
* My request in the last mail was to try to reproduce the problem with
nsd-config.tcl (i.e. no OpenACS involved). Yes, I replied previously that it
runs fine. And also a simple OpenACS install on Oracle runs fine. The problems
only start with our custom OpenACS package.
* To be on the safe side, all /usr/local/ns/bin/*.so files should be newly
compiled. Yes, these all appear to be freshly compiled.
# ls -l /usr/local/ns/bin/*.so
-rwxr-xr-x 1 nsadmin nsadmin 32560 Aug 10 15:31 /usr/local/ns/bin/nscgi.so
-rwxr-xr-x 1 nsadmin nsadmin 27360 Aug 10 15:31 /usr/local/ns/bin/nscp.so
-rwxr-xr-x 1 nsadmin nsadmin 15808 Aug 10 15:31 /usr/local/ns/bin/nsdb.so
-rwxr-xr-x 1 nsadmin nsadmin 50808 Aug 10 15:31 /usr/local/ns/bin/nsdbpg.so
-rwxr-xr-x 1 nsadmin nsadmin 16176 Aug 10 15:31 /usr/local/ns/bin/nsdbtest.so
-rwxr-xr-x 1 nsadmin nsadmin 32640 Aug 10 15:31 /usr/local/ns/bin/nslog.so
-rwxr-xr-x 1 nsadmin nsadmin 90688 Aug 10 15:42 /usr/local/ns/bin/nsoracle.so
-rwxr-xr-x 1 nsadmin nsadmin 90848 Aug 10 15:42
/usr/local/ns/bin/nsoraclecass.so
-rwxr-xr-x 1 nsadmin nsadmin 31712 Aug 10 15:31 /usr/local/ns/bin/nsperm.so
-rwxr-xr-x 1 nsadmin nsadmin 15888 Aug 10 15:31 /usr/local/ns/bin/nsproxy.so
-rwxr-xr-x 1 nsadmin nsadmin 16536 Aug 10 15:31 /usr/local/ns/bin/nssock.so
-rwxr-xr-x 1 nsadmin nsadmin 26624 Aug 10 15:31 /usr/local/ns/bin/nsssl.so
So my next steps are to try to reproduce the problem using your install-ns.sh
script. Then I can compile with debugging and have some fun with gdb.
thanks
Brian
________________________________
From: Gustaf Neumann <[email protected]>
Sent: Saturday 12 August 2023 11:55 am
To: [email protected]
<[email protected]>
Subject: Re: [naviserver-devel] Crashing on all versions >4.99.24 on Ubuntu
On 11.08.23 20:15, Brian Fenton wrote:
Hi Gustaf
thanks for the response. I've been looking at this in more detail this
afternoon and it does appear to be caused by something in the interaction of
our OpenACS application with 4.99.27. As I previously mentioned, it has been
running fine on 4.99.24 on the same Ubuntu version. I realise that I may not
have been clear on this point on my previous email: this is Naviserver running
on Ubuntu in a Docker container. The version of Naviserver is based on this
Docker build https://github.com/oupfiz5/naviserver-s6 which I have forked and
updated to 4.99.27 (I may well have missed something in updating NS version -
maybe I should have waited until oupfiz updates his build).
* I can confirm that nsd-config.tcl runs fine with 4.99.27
* Some good news: I am able to do an OpenACS clean install on Oracle with
4.99.27. I then successfully installed our application using the APM.
did you run at this state any Oracle queries?
did you recompile in the "clean install" also the oracle driver?
* However, once I restart Naviserver the problems start.
* I tried using the openacs-config.tcl that ships with 4.99.27 and the
problems are happening with that too.
you mean the crash happens in the plain openacs-config.tcl, with no additional
drivers etc, no oracle involved?
this can get us closer to something i might be able to reproduce. My request in
the last mail was to try to reproduce the problem with nsd-config.tcl (i.e. no
OpenACS involved). If you can reproduce the crash, you should compile with
debugging turned on and run nsd under gdb or lldb. First one should get he most
simple case causing the crash.
What is odd is that it seems to be able to handle one request before crashing.
Eg. I type in the URL, it shows the /register page but then crashes. After
restarting, I enter my login details on the register page, press return. It
then crashes. After restarting, it successfully logs me, then crashes again.
the memory errors or normally hinting on some buffer overflow, or a mixture
between 32bit and 64bit compilation, etc.
There is no clear pattern in the logs. I thought it might be related to OCSP
and disabled that, but the problems continued to occur.
if you suspect nsssl, then one potential problem might be a mixture during of
different OpenSSL versions during compilation (when using install_ns.sh, this
will not happen).
Turning on debug hasn't helped - but maybe there is so much information in the
log that I have missed something important.
What drivers are you referring to in your question?
actually all naviserver modules you are using, including the db drivers (since
you mentioned nsoracle, which is not part of the regular regression tests). To
be on the safe side, all /usr/local/ns/bin/*.so files should be newly compiled.
all the best
-gn
thanks
Brian
________________________________
From: Gustaf Neumann <[email protected]><mailto:[email protected]>
Sent: Thursday 10 August 2023 7:27 pm
To:
[email protected]<mailto:[email protected]>
<[email protected]><mailto:[email protected]>
Subject: Re: [naviserver-devel] Crashing on all versions >4.99.24 on Ubuntu
Hi Brian,
The new NaviServer versions are running fine on Ubuntu 22.04. Have you
recompiled the drivers you are using with the updated version?
A good test for the NaviServer binary is to test it with one of the packaged
configuration files, e.g. nsd-config.tcl.
all the best
-gn
On 10.08.23 18:23, Brian Fenton wrote:
Hello
we have been testing out our OpenACS application on Ubuntu 22.04.2 LTS
(previously we only ran on Windows). It was working great with Naviserver
4.99.24 but I have been getting constant crashes on more recent versions.
I get this error on 4.99.25, 4.99.26 and today I also got it on 4.99.27. The
server runs fine until I click on a page, then it immediately crashes.
The log has only the following error:
free(): invalid size
and today I got this one:
[10/Aug/2023:15:02:23][303.7fa3a64ee640][-conn:openacs:default:1:119-] Fatal:
received fatal signal 11
We have an Oracle application and are using the latest nsoracle driver, which
might be a factor here.
We have been running it with a pretty old OpenACS config file, so I am
currently looking to merge in all the latest changes to ensure that is not an
issue.
Also note that I am running Naviserver on Docker on Windows, but as mentioned
it was running great on 4.99.24.
thanks for any help
Brian
_______________________________________________
naviserver-devel mailing list
[email protected]<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/naviserver-devel
--
Univ.Prof. Dr. Gustaf Neumann
Head of the Institute of Information Systems and New Media
of Vienna University of Economics and Business
Program Director of MSc "Information Systems"
_______________________________________________
naviserver-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/naviserver-devel