On Wed, 16 Mar 2016 14:43:33 -0600 (MDT) Swift Griggs <swiftgri...@gmail.com> wrote: > Have you tried running a kernel with DDB enabled ? If the machine > will handle it, horsepower-wise, I'd turn on that and make sure all > your debugging symbols are rolled up into your kernel image (ie.. cc > -g which you can set by un-commenting the makeoption in your kernel > config). Then when the thing falls over again, get a backtrace from > it. > > If you get really angry and motiviated you might try a serial line > kernel debugger since it seems the bug might lockup the keyboard and > mouse. There are some instructions for that here: > > http://www.netbsd.org/docs/kernel/kgdb.html
Thanks to everyone for the suggestions. My problem here is that it is happening on a production server and I can't take too much time when it crashes to examine things. However, I am finding things in the running server that don't make sense to me so maybe there are some clues here that I can find without letting it crash. Currently I am rebooting during quiet times before it crashes by itself. I am running this script" #! /bin/sh PS="`ps -ax -orss | awk '{ sum += $1} END {print sum}'`" PROC="`grep '^Mem:' /proc/meminfo | awk '{print $3/1024}'`" printf "PS: %14d\n" $PS printf "PROC: %12d\n" $PROC When the system first starts the output looks like this: PS: 624040 PROC: 613664 I would have thought that the two numbers would be closer together but it gets even worse. After running for a few days it gets wildly divergent and the total from ps is an order of magnitude smaller than what /proc/meminfo shows. The other thing I see is that swap never gets used. The system appears to crash when real memory gets used up and swap used is always 0. I am on amd64 in case it matters. -- D'Arcy J.M. Cain <da...@netbsd.org> http://www.NetBSD.org/ IM:da...@vex.net