Hi, I'm having a problem with spontaneous restarts. This isn't a new problem, but I've done the obvious things and the problem hasn't gone away. I was thinking of asking on -hackers, but I'm trying here first.
The system is a 4.8 with a mix of patches and port upgrades of various ages. I'm planning to rebuild the whole thing, bringing it up to date, but I'm hoping to be able to wait for a 5.x in STABLE; I don't want to do this twice, since I expect I'll have to dump and restore everything. The hardware is a 2.6 GHz P4 with 2 GByte of GEIL dual-channel memory. (The problem existed on the previous, somewhat slower, memory as well.) The box contains the processor and motherboard (Gigabyte GA-SINXP1394), two floppy drives, CD and CD/W drives, an HP DAT, three IBM/Hitachi 36G/10K SCSI drives, and one 120G IDE. The SCSI card is by Adaptec; the video card is a low-end NVidia, and I'm running their video driver. The PS is an Antec True380, which should be enough for the box, with something to spare. There are several extra, large fans, of which more later. The system, monitor, printer, and cable modem are all powered through an APC BACK-UPS 450, about 18 months old. It's shown in the last week that it can keep things up for more than an hour. The symptom is a restart that leaves no indication of how it happened. Recently, the system shut down (completely, and at the power supply) instead of restarting. In that case, the last deliberate shutdown was a `shutdown -h now'; it appears that in every other case, the last deliberate shutdown was a `-r now'. (Question: does the machine architecture have settings for reset-resume .vs. reset-halt, settings that might be remembered when a later action occurs?) It has subsequently shut down with an immediate restart. There are no failure indications in the /var/log/messages, nor reported by dmesg. (The console scrolls by very quickly.) The message sequence over the restart typically looks like this: ======================================================================= Jun 7 18:39:09 moleend /kernel: arp: 184.108.40.206 moved from 00:05:00:e7:17:44 t o 00:05:00:e7:17:57 on em0 Jun 7 18:39:09 moleend /kernel: arp: 220.127.116.11 moved from 00:05:00:e7:17:57 t o 00:05:00:e7:17:44 on em0 Jun 7 18:59:06 moleend dhclient: New Network Number: 18.104.22.168 Jun 7 18:59:06 moleend dhclient: New Broadcast Address: 255.255.255.255 Jun 7 22:47:33 moleend /kernel: Copyright (c) 1992-2003 The FreeBSD Project. Jun 7 22:47:33 moleend /kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 198 9, 1991, 1992, 1993, 1994 ======================================================================== The restart most often occurs AFTER X has been shut down (and often restarted) but sometimes when X has not been run. It most often occurs when the system is under heavy CPU load, but sometimes when the load has been light. I thought at one time it might be a thermal problem and undertook to fix that. (I am still working to get more cooling air over the disks.) Right now, I have 120 mm fans rated at 130-135 CFM (Panaflow and JMC) pushing air in and out of the box, and pressurizing a duct feeding the CPU cooler, which is now cool to the touch. The memory modules are cool to the touch. While the disks need a proper plenum to route more air over them, I no longer believe that there is a thermal problem. The vid card's fan-blown heatsink is warm (not hot) to the touch; the northbridge's fan-blown heatsink is warm (not hot) to the touch. (Some people commute to white-collar jobs in heavy pickups; I drive a small server as my PC. No chrome pipes.) So: what should I do next? Should I set the system up to go to the kernel debugger on panic, or even start it via the kernel debugger? (Where is the full documentation?) Should I shell out for an even bigger power supply? Is there another log that I should examine? A restart wire that I should check? A power bus I should scope? (I'll have to borrow a scope somewhere.) Is it time for an exorcist? Thanks for your help. Mark Terribile __________________________________ Do you Yahoo!? Yahoo! Mail is new and improved - Check it out! http://promotions.yahoo.com/new_mail _______________________________________________ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"