On Nov 3, 2017, at 12:08 PM, Richard Hipp <d...@sqlite.org> wrote:
> 
> On 11/3/17, Olivier R. <m...@grammalecte.net> wrote:
>> 
>> 
>> Sorry. My knowledge of the C toolchain is null.
> 
> The next step will be to figure out
> how to attach the debugger to a hung process.

Problem #1 could be fixed (in principle) without any more help from you, 
Oliver: PIDs 888 and 893 are zombies, meaning Fossil is forking off children 
without calling wait() on them.  That’s why their VIRT column shows as 0 in 
your screenshot: the kernel has stripped all resources from them it can, and is 
holding onto only the exit status and such for the parent’s benefit.  This is a 
bug in Fossil, plain and simple.

That said, zombies are nearly harmless, merely adding noise to the process 
table.  They don’t explain your actual symptom.

The remaining PIDs are all certainly a single parent with multiple children.  
You’d have to run top in “tree” mode or show the PPID column to find out which 
one is the parent.  You can tell without doing that by the fact that all of the 
VIRT column values are identical, meaning that within the limits of top’s 
reporting resolution, the children are allocating no dynamic virtual memory of 
their own, which is what we’d expect from a forking HTTP child-per-conn model.

Given all of that, I’d just pick one of the PIDs and attach to it:

    $ gdb -p 26819

If that works, say “bt” when attached, then “quit” to detach again.  Post the 
backtrace output here, Oliver.

If it doesn’t work, it’s probably due to lack of debugging permission on the 
target system, in which case you’ve got some sysadminning ahead of you, not on 
topic here.

But, this does not look like a madly-spinning system.  The CPU is idle and the 
PIDs are pretty far apart.

Basically, it’s looking like each one is the result of an HTTP transaction and 
the child just isn’t dying at transaction end as it should.  This should only 
be a serious problem when the children collectively hold so many resources that 
the system can’t run properly.

Bottom line, I don’t think the top output explains the problem.
_______________________________________________
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

Reply via email to