Alright! Thanks to Jonathan Wright and his program, I think that I may have found something. See what you guys think:
Before the crash, the following three lines appeared (in this order) nearly 53,000 times for a total of 16MB of text: > Sep 17 13:45:51 kerwin [4314362.567000] ip_local_deliver: bad skb: PRE_ROUTING LOCAL_IN LOCAL_OUT POST_ROUTING > Sep 17 13:45:51 kerwin [4314362.567000] skb: pf=2 (unowned) dev=lo len=60 > Sep 17 13:45:51 kerwin [4314362.567000] PROTO=6 127.0.0.1:34134 127.0.0.1:111 L=60 S=0x00 I=15872 F=0x4000 T=64 These messages occurred over the course of two hours before the crash at a rate of more than 20 times per second. They are the messages that appeared just before the crash. Apparently, my computer was trying to tell me something pretty important. Since _this_ problem (still not sure if it is THE problem that is causing the crashes to occur; as Dave pointed out, it could be hardware or heat as well) appears to be something in networking, I'm going to recompile a kernel without all of the complex networking stuff, but one that includes my ethernet card's driver. I'll let you know how it goes, and if the problem persists. Thanks again. Kris On Saturday 17 September 2005 14:52, Dave Nebinger wrote: > > 1) The problem appears to be independant of the kernel version, as I've > > had it > > occur on a 2.6.10 and 2.6.12 kernel. > > > > 2) How might I check for flakey hardware? > > I would guess hardware problem (unless 3 applies below), but actually > finding the errant component can be quite a task. For a desktop you can > strip down to bare minimum, let it run, add a component, let it run, and > repeat until you find one that causes the crash, although that might either > be due to the component or interactions between components, so even that's > not reliable. > > Sounds like you have a laptop which makes that scenario harder. Did it > come with any diagnostic tools, ones that know how to check out the > hardware components and look for errors? > > > 3) I have had my BIOS respond after 3 crashes that the computer crashed > > due to > > excessive heat. I think that this maybe independant of the problem as > > well, > > because I haven't had this BIOS message in conjunction with a crash for > > several months. I've also had a crash occur when I flipped my laptop > > upside-down and placed an ice pack over the portion that produced the > > most heat > > Heat can really be an issue, especially for laptops. And the icepack > wouldn't necessarily keep all of the components inside below the threshold > when the crash occurs, if it is heat related. > > > Once I have this information, we can go ahead and figure out why my > > kernel keeps crashing. But first, I have to figure out how to trace my > > kernel's oops > > message. Without that information, the above answers don't really mean > > much. > > > > If you could please help me to figure out a way to log old kernel > > messages and > > find them on subsequent boots, that would be most appreciated. > > Depending upon the fault that occurs, if it is hardware related, you might > never get any worthwhile information out of the kernel even if you could > get this information... If the computer just locks up (due to heat or > hardware), it would do so w/o giving the kernel time to log anything that > might be of value. > > I guess I would try to rule out heat as the problem first. If your laptop > is a newer model, you should be able to access the on-board temperature > sensors (there's been a recent thread on that on the list, and I am by far > no expert on it). Get them running via a cron task to collect info over > time, that way you should be able to see the temp values right before a > crash kicks in; if they don't really change, you can probably rule heat out > as the issue. > > If it is a hardware problem, you're stuck with what the vendor provided. > I'm not certain there's any diagnostic tools under linux that would do any > of this for you. The vendor's probably going to snub their nose at you as > they gave it to you with windows on it and you're running the 'unsupported' > os. Perhaps there's some happy middleman out there that does hardware > issues on laptops with linux, but that would be a service that would cost > you. -- [email protected] mailing list

