Re: Random lockups on an email server - possibly kern/50168

D'Arcy J.M. Cain Sun, 27 Mar 2016 09:22:33 -0700

On Wed, 16 Mar 2016 14:43:33 -0600 (MDT)
Swift Griggs <swiftgri...@gmail.com> wrote:
> Have you tried running a kernel with DDB enabled ? If the machine
> will handle it, horsepower-wise, I'd turn on that and make sure all
> your debugging symbols are rolled up into your kernel image (ie.. cc
> -g which you can set by un-commenting the makeoption in your kernel
> config). Then when the thing falls over again, get a backtrace from
> it.
> 
> If you get really angry and motiviated you might try a serial line
> kernel debugger since it seems the bug might lockup the keyboard and
> mouse. There are some instructions for that here:
> 
> http://www.netbsd.org/docs/kernel/kgdb.html


Thanks to everyone for the suggestions.  My problem here is that it is
happening on a production server and I can't take too much time when it
crashes to examine things.  However, I am finding things in the running
server that don't make sense to me so maybe there are some clues here
that I can find without letting it crash.  Currently I am rebooting
during quiet times before it crashes by itself.

I am running this script"

#! /bin/sh

PS="`ps -ax -orss | awk '{ sum += $1} END {print sum}'`"
PROC="`grep '^Mem:' /proc/meminfo | awk '{print $3/1024}'`"

printf "PS: %14d\n" $PS
printf "PROC: %12d\n" $PROC

When the system first starts the output looks like this:

PS:         624040
PROC:       613664

I would have thought that the two numbers would be closer together but
it gets even worse.  After running for a few days it gets wildly
divergent and the total from ps is an order of magnitude smaller than
what /proc/meminfo shows.

The other thing I see is that swap never gets used.  The system appears
to crash when real memory gets used up and swap used is always 0.

I am on amd64 in case it matters.

-- 
D'Arcy J.M. Cain <da...@netbsd.org>
http://www.NetBSD.org/ IM:da...@vex.net

Re: Random lockups on an email server - possibly kern/50168

Reply via email to