On Tue, Aug 12, 2003 at 03:52:34PM -0400, Nick Fisher wrote: > I have a machine that I cannot compile a stable 2.4.20 kernel for, yet the > one off of the 1.4_rc2 liveCD works fine. I'm guessing there is an option > or a patch that is/isn't set/applyed. Apart from good old trial and error > how the heck do I work out what is giving me the problem?
I've often found the NMI watchdog timer to be extremely helpful with unexplained kernel lockups. Documentation is in /usr/src/linux/Documentation/nmi_watchdog.txt. Basically, you append "nmi_watchdog=1" to your kernel launch from LILO or Grub - in a few rare cases, you need a value other than 1. When the kernel locks up, the watchdog detects it and dumps interesting traceback to the console. I've been able to correlate that traceback to symbols in /proc/ksyms and identify malfunctioning drivers. It works best if you can set up with a remote console over the serial port. If you can't, don't run X or you won't see the dump when it happens. Then grab a pen and start writing down addresses :-). Nathan Meyers [EMAIL PROTECTED] > Every kernel I have compiled for this one machine has failed. My general > test is to recompile the kernel multiple times (As recomended by > DRobbins). One of my kernels once made it through three compiles. If I > start from the liveCD and chroot into my gentoo install and compile, it > goes for days. So (unless I have missed something) the problem appears to > be the kernel. I did think it was hardware for some time but after a few > days of CPUburn and Memtest86 I have basicly discounted that. If it were > the hardware I couldn't continuously compile the kernel for over a day > chrooted from the liveCD. > > So from what I can tell there is something *wrong* with the way I'm > compiling this kernel or the options I'm setting. When the machine crashes > it just stops. No errors, no nothing. It just stops. I have scoured the > logs for any kernel panics or segfaults and have found nothing. I even > remembered stop caching in Metalog. As an experiment I tryed using the > config from the new and confusing (I find) genkernel. Same result. > > The machine is based on a SuperMicro P6DBU (Rev 1.1), BIOS r3.1 (latest). > It has a Adaptec 29160 (SCSI wide 160 card, BIOS 2.57.2) that I have been > compiling in the aic7xxx driver for and a tulip drivin network card. It > has 1.5GB of RAM and 2xPIII 500 (kalamaths). > > I'm basicly in a rut now. I'm trying any wacky kernel configuration that I > can think of with no real plan. I have in the past managed to build stable > kernels for various machines but I just can't get a handle on this one. > If anyone has any ideas, troubleshooting tips or dumb ideas... I would > love to hear them. > > Nick > > -- > [EMAIL PROTECTED] mailing list > > > -- -- [EMAIL PROTECTED] mailing list
