> On 2021-09-24, at 05:58, Philip Webb <[email protected]> wrote:
> 
> While I was asleep yesterday, my machine reported on all  3  Konsoles :
> 
> Message from syslogd@  at Thu Sep 23 19:38:11 2021 ...
> : mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4: 9d0b4c16001d011b
> 
> Message from syslogd@  at Thu Sep 23 19:38:11 2021 ...
> : mce: [Hardware Error]: TSC 0 ADDR 19e617980 MISC c01a000001000000 
> 
> Message from syslogd@  at Thu Sep 23 19:38:11 2021 ...
> : mce: [Hardware Error]: PROCESSOR 2:600f20 TIME 1632440315 SOCKET 0 APIC 0 
> microcode 6000822
> 
> -- end of report --
> 
> I don't remember seeing this before : how concerned should I be ?

From the manpage:

       Most  errors  can be corrected by the CPU by internal error correction 
mechanisms. Uncorrected
       errors cause machine check exceptions which may kill processes or panic 
the machine.  A  small
       number  of  corrected errors is usually not a cause for worry, but a 
large number can indicate
       future failure.

       When an uncorrected machine check error happens that the kernel cannot 
recover  from  then  it
       will  usually  panic  the  system.   In  this case when there was a warm 
reset after the panic
       mcelog should pick up the machine check errors after reboot.  This is  
not  possible  after  a
       cold reset.

If you are overclocking, try disabling it.


Reply via email to