1) The problem appears to be independant of the kernel version, as I've
had it
occur on a 2.6.10 and 2.6.12 kernel.
2) How might I check for flakey hardware?
I would guess hardware problem (unless 3 applies below), but actually
finding the errant component can be quite a task. For a desktop you can
strip down to bare minimum, let it run, add a component, let it run, and
repeat until you find one that causes the crash, although that might either
be due to the component or interactions between components, so even that's
not reliable.
Sounds like you have a laptop which makes that scenario harder. Did it come
with any diagnostic tools, ones that know how to check out the hardware
components and look for errors?
3) I have had my BIOS respond after 3 crashes that the computer crashed
due to
excessive heat. I think that this maybe independant of the problem as
well,
because I haven't had this BIOS message in conjunction with a crash for
several months. I've also had a crash occur when I flipped my laptop
upside-down and placed an ice pack over the portion that produced the most
heat
Heat can really be an issue, especially for laptops. And the icepack
wouldn't necessarily keep all of the components inside below the threshold
when the crash occurs, if it is heat related.
Once I have this information, we can go ahead and figure out why my kernel
keeps crashing. But first, I have to figure out how to trace my kernel's
oops
message. Without that information, the above answers don't really mean
much.
If you could please help me to figure out a way to log old kernel messages
and
find them on subsequent boots, that would be most appreciated.
Depending upon the fault that occurs, if it is hardware related, you might
never get any worthwhile information out of the kernel even if you could get
this information... If the computer just locks up (due to heat or
hardware), it would do so w/o giving the kernel time to log anything that
might be of value.
I guess I would try to rule out heat as the problem first. If your laptop
is a newer model, you should be able to access the on-board temperature
sensors (there's been a recent thread on that on the list, and I am by far
no expert on it). Get them running via a cron task to collect info over
time, that way you should be able to see the temp values right before a
crash kicks in; if they don't really change, you can probably rule heat out
as the issue.
If it is a hardware problem, you're stuck with what the vendor provided.
I'm not certain there's any diagnostic tools under linux that would do any
of this for you. The vendor's probably going to snub their nose at you as
they gave it to you with windows on it and you're running the 'unsupported'
os. Perhaps there's some happy middleman out there that does hardware
issues on laptops with linux, but that would be a service that would cost
you.
--
gentoo-user@gentoo.org mailing list