* >> Are those the only MCA errors you're seeing? The reason I ask is that
there's an errata in the X5600 series which can cause an "internal timer
error" MCA to be logged after another uncorrectable MCA occurs.* 

90% are these MCA errors regarding rest of the 10% there is no log for it
such as one of the supermicro was rebooted two days ago but it was unable to
generate crashdump under /var/crash directory though dump is enabled in
rc.conf :


*>>This seems to me like it would be a CPU failure.  Can you try replacing 
the CPU itself?  I've seen this exact message on a different board, and 
the cause was a failing CPU. *

We're thinking to replace x5690 with x5675 CPUs.

*>>Well, mcelog has this hardcoded and prints this for every MCA just as a 
matter of course.  It isn't selective but assumes every machine check is 
a hardware error (which they are, though some are warnings for corrected 
events that you can ignore as the hardware hasn't degraded enough to 
warrant replacement.  However, corrected events don't generate panics, 
just messages in the logs, and only a subset of corrected events include 
the "yellow / green" indicators for which you can ignore "green" events. 
Even corrected ECC errors I would ignore if you get a few events with 
a count of 1 that don't recur). *

Each time the MCA error occurs, server went down. So please guide how do we
suppose to tackle this issue ?
>> Depending on the CPU model, you can determine more info about the 
error using the CPU manuals (for Intel the SDM). *
CPU is x5690, is there a link we can get manual for supermicro x5690 cpu ?

View this message in context: 
Sent from the freebsd-current mailing list archive at Nabble.com.
freebsd-current@freebsd.org mailing list
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Reply via email to