On Sat, 13 May 2000, Jonathan Buset wrote:
> The server specs:
>
> 196MB PC100 RAM
isn't this really 192? (128+64, or 64+64+64). Or do you actually have a
4MB DIMM in there?
> kernel-2.2.14 (compiled from source before I shipped it)
was it the offical Linus version, or RedHat's version (which has other
patches)?
> Since these CODE messages are all the same, you said it's probably software
> related. Is there any way
> to track what software is conflicting? I can't run memtest86 because I
> cannot physically access the machine. I don't think it's faulty ram though.
Interesting data. I am not a kernel hacker, only a sysadmin who has lived
through this stuff several times. My ASM is 15 years rusty and for a
different platform, so I am not much help there.
Based on my own past experience, if you are getting OOPSes in exactly the
same routine every time, it is likely to be software, unless that routine
is something that is dealing with hardware and is OOPSing due to a
hardware problem with a bad SCSI controller, for example.
Since the reboot have the OOPSes gone away, or are they still happening?
Do you have a list of all of the daemons you were gettting an OOPS in? Can
it be reproduced?
Of course, I am assuming that you are not overclocking the BP6, too. If
so, all bets are off.
This doesn't seem like a BP6 specific issue -- those just usually lock the
machine up hard without even a kernel oops first (at least that what mine
does every 3-4 weeks).
I'd suggest you post a summary on linux-kernel, using the OOPS report
format found in Appendix B of:
http://www.tux.org/lkml/old-faq.txt
You probably want to read the linux-kernel mailing list FAQ at
http://www.tux.org/lkml/
and in particular the section of how to capture and report an OOPS
http://www.tux.org/lkml/#s4-2
I try to make it a matter of course to setup a serial console on all
"production" linux boxes I maintain now, with logging of all output to a
really stable seperate machine (like a 486 with a 2.0.x kernel -- a good
use for old laptops, actually.) Just in case of a kernel oops... a lot
easier than getting a paniced on-site person to copy down a oops from the
screen by hand.
lastly, not to sound like a M$ person, why not upgrade to the latest
stable kernel? 2.2.15 is out now - release notes at:
http://www.linux.org.uk/VERSION/relnotes.2215.html
Upgrading a remote box can be a little scarey, but it can be
done. Especially helpful is having a working boot disk of the previous
kernel image (appropriately rdev'ed) and the ability for someone to insert
it onsite. I don't know if that is an option for your co-location place.
--
James Troutman, Troutman & Associates - telecommunications consulting
93 Main Street, Waterville, Maine 04901 - 207-861-7067
--
=- To unsubscribe, email [EMAIL PROTECTED] with the -=
=- body of "unsubscribe linux-abit". -=