1) The problem appears to be independant of the kernel version, as I've had it
occur on a 2.6.10 and 2.6.12 kernel.

2) How might I check for flakey hardware?

I would guess hardware problem (unless 3 applies below), but actually finding the errant component can be quite a task. For a desktop you can strip down to bare minimum, let it run, add a component, let it run, and repeat until you find one that causes the crash, although that might either be due to the component or interactions between components, so even that's not reliable.

Sounds like you have a laptop which makes that scenario harder. Did it come with any diagnostic tools, ones that know how to check out the hardware components and look for errors?

3) I have had my BIOS respond after 3 crashes that the computer crashed due to excessive heat. I think that this maybe independant of the problem as well,
because I haven't had this BIOS message in conjunction with a crash for
several months. I've also had a crash occur when I flipped my laptop
upside-down and placed an ice pack over the portion that produced the most
heat

Heat can really be an issue, especially for laptops. And the icepack wouldn't necessarily keep all of the components inside below the threshold when the crash occurs, if it is heat related.

Once I have this information, we can go ahead and figure out why my kernel
keeps crashing. But first, I have to figure out how to trace my kernel's oops message. Without that information, the above answers don't really mean much.

If you could please help me to figure out a way to log old kernel messages and
find them on subsequent boots, that would be most appreciated.

Depending upon the fault that occurs, if it is hardware related, you might never get any worthwhile information out of the kernel even if you could get this information... If the computer just locks up (due to heat or hardware), it would do so w/o giving the kernel time to log anything that might be of value.

I guess I would try to rule out heat as the problem first. If your laptop is a newer model, you should be able to access the on-board temperature sensors (there's been a recent thread on that on the list, and I am by far no expert on it). Get them running via a cron task to collect info over time, that way you should be able to see the temp values right before a crash kicks in; if they don't really change, you can probably rule heat out as the issue.

If it is a hardware problem, you're stuck with what the vendor provided. I'm not certain there's any diagnostic tools under linux that would do any of this for you. The vendor's probably going to snub their nose at you as they gave it to you with windows on it and you're running the 'unsupported' os. Perhaps there's some happy middleman out there that does hardware issues on laptops with linux, but that would be a service that would cost you.

--
gentoo-user@gentoo.org mailing list

Reply via email to