OK. It happened again while I was away for few days.
The main process has created 10 subprocesses. Fossil was very slow.
I used gdb on the main process.
When gdb was active, Fossil didn’t answer when asking for a webpage. It
seemed blocked. And Fossil was responsive again few seconds after I quit
gdb.
This time I didn’t kill the processes. I can try to do something else if
you want.
Here is what gdb said:
-------------------------------------------------------------------
GNU gdb (Debian 7.7.1+dfsg-5) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 17876
Reading symbols from /usr/bin/fossil2...done.
Reading symbols from /usr/local/lib/libz.so.1...(no debugging symbols
found)...done.
Loaded symbols for /usr/local/lib/libz.so.1
Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2...Reading symbols
from /usr/lib/debug//lib/x86_64-linux-gnu/libdl-2.19.so...done.
done.
Loaded symbols for /lib/x86_64-linux-gnu/libdl.so.2
Reading symbols from /lib/x86_64-linux-gnu/libm.so.6...Reading symbols
from /usr/lib/debug//lib/x86_64-linux-gnu/libm-2.19.so...done.
done.
Loaded symbols for /lib/x86_64-linux-gnu/libm.so.6
Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...Reading symbols
from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.19.so...done.
done.
Loaded symbols for /lib/x86_64-linux-gnu/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from
/usr/lib/debug//lib/x86_64-linux-gnu/ld-2.19.so...done.
done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
0x00007fd8ff85e873 in __select_nocancel () at
../sysdeps/unix/syscall-template.S:81
81 ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) bt
#0 0x00007fd8ff85e873 in __select_nocancel () at
../sysdeps/unix/syscall-template.S:81
#1 0x0000000000416162 in cgi_http_server (mnPort=mnPort@entry=8080,
mxPort=mxPort@entry=8180, zBrowser=<optimized out>, zBrowser@entry=0x0,
zIpAddr=zIpAddr@entry=0x0, flags=flags@entry=12) at ./src/cgi.c:1845
#2 0x000000000044f92a in cmd_webserver () at ./src/main.c:2493
#3 0x0000000000407fec in main (argc=<optimized out>, argv=<optimized
out>) at ./src/main.c:760
(gdb) quit
A debugging session is active.
Inferior 1 [process 17876] will be detached.
Quit anyway? (y or n) y
Le 06/11/2017 à 18:19, Warren Young a écrit :
Problem #1 could be fixed (in principle) without any more help from
you, Oliver: PIDs 888 and 893 are zombies, meaning Fossil is forking
off children without calling wait() on them. That’s why their VIRT
column shows as 0 in your screenshot: the kernel has stripped all
resources from them it can, and is holding onto only the exit status
and such for the parent’s benefit. This is a bug in Fossil, plain
and simple.
That said, zombies are nearly harmless, merely adding noise to the
process table. They don’t explain your actual symptom.
The remaining PIDs are all certainly a single parent with multiple
children. You’d have to run top in “tree” mode or show the PPID
column to find out which one is the parent. You can tell without
doing that by the fact that all of the VIRT column values are
identical, meaning that within the limits of top’s reporting
resolution, the children are allocating no dynamic virtual memory of
their own, which is what we’d expect from a forking HTTP
child-per-conn model.
Given all of that, I’d just pick one of the PIDs and attach to it:
$ gdb -p 26819
If that works, say “bt” when attached, then “quit” to detach again.
Post the backtrace output here, Oliver.
If it doesn’t work, it’s probably due to lack of debugging permission
on the target system, in which case you’ve got some sysadminning
ahead of you, not on topic here.
But, this does not look like a madly-spinning system. The CPU is
idle and the PIDs are pretty far apart.
Basically, it’s looking like each one is the result of an HTTP
transaction and the child just isn’t dying at transaction end as it
should. This should only be a serious problem when the children
collectively hold so many resources that the system can’t run
properly.
_______________________________________________
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users