Dear David,
Technically, every unix fork() creates a new process by duplicating the
calling process.
If the calling process is large this can lead to the problem you are
mentioning.
However, on Linux uses a copy-on-write. This means that while after the
fork,
both processes might show RSS of 4GB, but it does not mean that 8GB are
really
used. However, based on the overcommit_memory settings, it might be
the case that the oom killer kicks in.
Concerning your configuration: Are you using nsproxy? This module was made
for addressing this issue (instead of forking on every Tcl "exec", the
command is
sent via pipe to a second process that executes it. This process has
typically
a much smaller memory footprint, so the problem does not become worse,
when nsd uses are huge memory footprint).
Do you monitor the size of nsd over time? Is it normal that nsd has
with the given configuration 4GB RSS? There might be a problem with the
application code causing some memory growth. The chart below is
generated with munin and the munin-plugins-ns from
https://github.com/gustafn/munin-plugins-ns (some plugins are for OpenACS,
some are fine for every NaviServer installation).
Do you you system-malloc? One can reduce the size of the memory
footprint significantly by using SYSTEM_MALLOC together with
TCMalloc (see e.g.
https://next-scripting.org/2.3.0/doc/misc/thread-mallocs).
The following chart show the effects on openacs.org, when i switched
to SYSTEM_MALLOC + TCMalloc around August (same code, some
number of requests, etc.)
-g
yearly graph
On 01.11.19 18:02, David Osborne wrote:
Hi,
I was wondering if anyone could point us in the right direction with
an intermittent problem we're seeing. Apologies for the wooly problem
description - we don't have a firm test case as of yet.
These instances are running NaviServer/4.99.16d10 on Debian Jessie 8.10
The problem usually manifests itself as memory exhaustion on the
server in question where the system's OOM killer is invoked.
What seems to be causing the memory exhaustion is a copy of the main
nsd process (sometimes several) which can use large amounts of memory.
For example we captured a snapshot of this copy of the nsd daemon
(also using 4gb RSS) after it had been running for about 5 hours.
image.png
It appears as if there are 2 of main nsd daemons running. Usually only
1 nsd daemon is running on this server.
When this child was killed by the OOM killer, it was a Tcl exec of an
external command which was running within it - not a command that
would normally 5 hours to complete.
child killed: kill signal
while executing
"exec $cmd << $input"
In test, when running an exec from naviserver, I see the forked
process being created and initially named "nsd", then it takes on the
name of the underlying command. I'm not sure why this isn't happening
in these cases.
Any insights?
_______________________________________________
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel