> Hot Diggety! Nick Fisher was rumored to have written: >> Every kernel I have compiled for this one machine has failed. My general >> test is to recompile the kernel multiple times (As recomended by >> DRobbins). One of my kernels once made it through three compiles. If I >> start from the liveCD and chroot into my gentoo install and compile, it >> goes for days. So (unless I have missed something) the problem appears >> to >> be the kernel. I did think it was hardware for some time but after a few >> days of CPUburn and Memtest86 I have basicly discounted that. If it were > Don't be too quick to discount it. It is possible to run memtest86 and > still not detect marginal RAM - friend was in that boat; memtest86 said > it was clean but he stll replaced it, anyway, and all his problems went > away when he put in the high quality RAM. He hasn't crashed once since. Ummmmm... I don't think I was too quick. I have heard that Memtest86 was not 100%, but the fact that the server runs flawlessly while continually compiling the kernel from the liveCD..... really makes me think it's not hardware. If I run compiles from the kernels I have made, it generally crashes within an hour. If it was a hardware problem I would expect to see the same (or near same) results if I booted from the CD or the HD... and thats not the case.
>> the hardware I couldn't continuously compile the kernel for over a day >> chrooted from the liveCD. >> >> So from what I can tell there is something *wrong* with the way I'm >> compiling this kernel or the options I'm setting. When the machine >> crashes >> it just stops. No errors, no nothing. It just stops. I have scoured the > That basically sounds like an hardware issue of some sort. I agree that it totaly sounds like that. I spent quite a bit of time barking up that tree. However as I said above, the liveCD preformance really makes me think I was on the wrong track. > What's the CPU's temperature? (BIOS report or lm_sensors type of report) > If it's getting too hot, it could be leading to random bit flips which > results in either a nasty crash or undefined/unpredictable behavior. Been down that road all ready. The CPUs heat up some but they max out after an hour. I should have mentioned that in the last mail. > If it's just silently powering off, could potentially be some sort of VRM, > power loading, power supply issue... but that's relatively rare. Nope, the machine just stops.... dead. Annother way to express it would be to say that it freezes. Nothing works and I'm left looking at the last thing that was echoed to the console :/ Nick -- [EMAIL PROTECTED] mailing list
