Hi Gustaf

thanks to your help, I was able to track down the source of the problem: it was 
a TCL/ADP pair that generates some Javascript. This gets called in the blank 
master and so was breaking on every page request. Once I removed it from the 
blank master I was able to access the site. After looking into further, I 
discovered a few more places where the same problem occurs: a few more 
Javascript pages generated from TCL/ADP pairs and also a file download TCL page.

What all of these have in common are calls like this - removing these lines 
prevents the crash in each case:
 ns_set update [ns_conn outputheaders] content-type "application/javascript"
 ns_set update [ns_conn outputheaders] content-type "$mime_type"

I created a simple test case which reproduces the problem on our OpenACS system 
i.e browsing to this page immediately crashes Naviserver.
However the same test case runs fine without crashing on the simple 
nsd-config.tcl system.
It also runs fine on a fresh OpenACS Oracle installation - this again suggests 
an issue specific to our application.

I'm not sure how much deeper I can go here - would be grateful for your 
thoughts.

This is my test case - if I create a file called test.adp and browse to it, 
Naviserver crashes.
<html>
<body>
   Welcome to NaviServer
   <%=[ns_info patchlevel]%> under
   <%=[set . "$::tcl_platform(os) $::tcl_platform(osVersion)"]%>
   <%=[ns_set update [ns_conn outputheaders] content-type "text/html"]%>
 </body>
</html>

regards
Brian

________________________________
From: Gustaf Neumann <neum...@wu.ac.at>
Sent: Wednesday 16 August 2023 7:31 pm
To: Navidevel <naviserver-devel@lists.sourceforge.net>
Subject: Re: [naviserver-devel] Crashing on all versions >4.99.24 on Ubuntu

On 16.08.23 15:37, Brian Fenton wrote:
We don't issue any "ns_set cleanup" ourselves, but when I added your 
recommendation of tracing "ns_set", I saw plenty in the logs. The docs say that 
"This command is autoamtically executed by ns_cleanup, which runs after every 
request, freeing all sets created via ns_set", so I presume that is where the 
calls come from.

Hi Brian,

Yes, it is clear from the log that the crash happens in the automatic cleanup, 
but I just wanted to make sure, that the application has not tried the same 
earlier - this would be a good place to start debugging.

Is there something in particular I should be looking for in the trace output? 
There is an enormous amount of information the logs.

see below


I ran another backtrace and this time the error was a little bit different
as i said before, the messages one gets from the operating system are not 
always helpful to find the source of the problem.

("error: Cannot access memory at address 0x2" - see below).

This is actually not a message of the operating system, but a message from gdb 
about an unused C-structure, which is here not relevant.

... but we are getting closer.  The crash happens on an dynamic ns_set (in 
yesterday's mail in "d6").

#11 0x00007ffff7f3e119 in NsTclSetObjCmd
(clientData=0x7fffdc035570, interp=0x7fffdc005250, objc=2,
objv=0x7fffdc245650) at tclset.c:330

        key = 0x7fffdc3aa070 "d6"

In this case, one has to watch out for operations on "d6". You might ask "What 
is a dynamic ns_set?".

Background: ns_sets might be

- C-only structures, or
- Tcl-exposed structures, which might be
   * dynamic (i.e., freed after every request, the Tcl-name of these starts 
with a "d"), or
   * static (the structures will be reused).

The Tcl-exposed ns_sets are "entered" (NaviServer terminology) into Tcl 
interpreters. It might be the case, that you do not find anything interesting 
in the log concerning the crashing ns_set (here "d6") in the Tcl trace output, 
... since the ns_sets are as well manipulated from C.

In case, you find nothing revealing on "d6", I've added more debugging output 
for C-level ns_set operations. Get the newest version from the branch 
"release/4.99", activate debug output on Debug(nsset)

    ns_logctl severity Debug(nsset) on

and then you will see some more output, like e.g. the following:

[16/Aug/2023:19:53:13.016627][-conn:oacs-head:default:0:11-] Debug(nsset): 
Ns_SetPut 0x600002845600 [29] key 'item_id' value '1066' size -1
[16/Aug/2023:19:53:13.016630][-conn:oacs-head:default:0:11-] Debug(nsset): 
Ns_SetPut 0x600002845600 [30] key 'revision_id' value '163776' size -1
[16/Aug/2023:19:53:13.016633][-conn:oacs-head:default:0:11-] Debug(nsset): 
Ns_SetPut 0x600002845600 [31] key 'publish_date' value '2021-11-13 
13:35:27.423904+01' size -1
[16/Aug/2023:19:53:13.016637][-conn:oacs-head:default:0:11-] Debug(nsset): 
Ns_SetPut 0x600002845600 [32] key 'modifying_user' value '' size -1
[16/Aug/2023:19:53:13.016640][-conn:oacs-head:default:0:11-] Debug(nsset): 
Ns_SetPut 0x600002845600 [33] key 'last_modified' value '2017-08-08 
13:26:49.138414+02' size -1
[16/Aug/2023:19:53:13.016644][-conn:oacs-head:default:0:11-] Debug(nsset): 
Ns_SetPut 0x600002845600 [34] key 'modifying_ip' value '::1' size -1
[16/Aug/2023:19:53:13.016647][-conn:oacs-head:default:0:11-] Debug(nsset): 
Ns_SetPut 0x600002845600 [35] key 'modifying_user' value '704' size -1
[16/Aug/2023:19:53:13.016651][-conn:oacs-head:default:0:11-] Debug(nsset): 
Ns_SetPut 0x600002845600 [36] key 'parent_id' value '1064' size -1
[16/Aug/2023:19:53:13.016655][-conn:oacs-head:default:0:11-] Debug(nsset): 
EnterSet 0x600002845600 with name 'd15'
[16/Aug/2023:19:53:13.016678][-conn:oacs-head:default:0:11-] Notice: trace: 
ns_set array d15

...
[16/Aug/2023:20:04:31.283342][-conn:oacs-head:default:0:11-] Debug(nsset): 
ns_set cleanup key <d15> dynamic 1
[16/Aug/2023:20:04:31.283344][-conn:oacs-head:default:0:11-] Debug(nsset): 
Ns_SetFree 0x600002845600 'db': elements 37
[16/Aug/2023:20:04:31.283346][-conn:oacs-head:default:0:11-] Debug(nsset): ... 
0: key <object_type> value <::xowiki::Form>
[16/Aug/2023:20:04:31.283349][-conn:oacs-head:default:0:11-] Debug(nsset): ... 
1: key <package_id> value <1060>
[16/Aug/2023:20:04:31.283351][-conn:oacs-head:default:0:11-] Debug(nsset): ... 
2: key <creation_user> value <704>
...



In this example output, there is a bunch of Ns_SetPut() operations on a set 
"0x600002845600"  which is entered then to Tcl as set "d15".... somewhere later 
it is freed via "ns_set cleanup".

Let me know if you need some help on getting/compiling a branch of NaviServer 
from Bitbucket.

all the best

-g



_______________________________________________
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel

Reply via email to