Hey all, I had a kernel panic on a RHEL4 box the other day, and since then I've been digging into postmortem analysis. Is anyone else doing this? After reading up on 'crash' and related tools, I've decided I'm going to be adding this to my list of standard stuff I always do when setting up a new box. This looks like a good way to build kernel knowledge, and since I want to do embedded linux development as my masters project, that's something I need.
On RHEL4 there is netdump and diskdump. The idea is that you want to capture the contents of you RAM so you can debug the kernel with GDB. The docs to netdump say it's supposed to be better, since the kernel can put the NIC in a mode that doesn't require it to respond to IRQs. I guess this is so when the capture occurs, there are less things that could stomp on your RAM. What's supporsed to happen is the netdump module fires up when the panic occurs, authenticates (via ssh if you desire) against a server, then starts feeding it UDP packets in cleartext to port 6666. These should appear in /var/crash But for me it didn't work. The docs don't tell you to make sure you open 6666 to iptables, though it's pretty obvious that'll be closed out of the box on pretty much any modern distro. They also say you need to be using a supported NIC, which l'm not. Diskdump worked fine, though. All I had to do was specify I wanted to dump to the swap partition and then start the tool. Testing this setup, I had a great time crashing the box with the Magic SysRq Key: Alt-SysRq-C. First time in 10 years I've wanted to crash a box on purpose! Then on bootup I was pleased to see the raw crash dump file being copied back out of the swap partition and into /var/crash. Pass in the /var/crash/.../vmcore with the kernel you built with debug info, or the one from kernel-debuginfo rpm, and you're off. Now I can run 'ps', 'fuser', look at the task_struct, interrupts, pretty much anything! I've got 4 books on the kernel, and now I've finally getting some test data. I was wondering if anyone on this list is already doing this regularly and wants to provide insight. Or if anyone has a system crashes, I volunteer to look into it (contact me directly at dave _AT_ peoplemerge.com). Dave -- [email protected] http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-list
