Charles Sprickman wrote:
On Fri, 18 Nov 2005, Uwe Doering wrote:
Charles Sprickman wrote:

I've been digging through Google for more information on this. I have a 4.8 box that's been up for about 430 days. In the last week or so, top and ps have started reporting all CPU usage numbers as zero, and running "systat -vmstat" results in the message "The alternate system clock has died! Reverting to ``pigs'' display".
[...]

We had this once at work, quite a while ago. The "alternate system clock" is in fact the Real Time Clock (RTC) on the mainboard. In our case we were lucky in that it was just the quartz device that failed due to an improperly soldered lead which finally came off. We fixed the soldering and the problem was gone.

Are there any tools to verify that the RTC is working?

"systat -vmstat" will show you the interrupt that it drives. In our case it's irq8, which is in fact labeled "rtc". It is supposed to run at 128 Hz. Under load it can drop to some lower value. This is normal.

I don't exactly understand what the RTC is, but would the machine not be suffering some other problems if there was an actual hardware failure? Doesn't the system rely on this to time everything from the processors to memory to PCI slots and interrupts?

No, the RTC drives only the interrupt that is responsible for collecting the CPU usage data. When it fails the CPU usage in "top", "ps" etc. just drops to zero, as you've observed, but the server continues to run. If the failure is permanent the machine refuses to boot, though. At least that's what happened in our case. Apparently the RTC chip is essential to the mainboard's boot sequence. For instance, the initial date and time information comes from this chip.

On the other hand, if a reset corrects the problem then the RTC chip probably got hung, or there is a problem with the interrupt controller it is connected to. On a properly working mainboard this shouldn't happen, of course.

Is there any simple way to figure out if this is hardware or software?

I don't know of any. However, we run FreeBSD almost since 4.0, on various mainboards, UP and SMP, and we've never seen these symptoms but in this one case mentioned above. So I suppose it's not a kernel bug. I haven't looked at the PR database, though.

   Uwe
--
Uwe Doering         |  EscapeBox - Managed On-Demand UNIX Servers
[EMAIL PROTECTED]  |  http://www.escapebox.net
_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to