> On 2021-09-24, at 05:58, Philip Webb <[email protected]> wrote:
>
> While I was asleep yesterday, my machine reported on all 3 Konsoles :
>
> Message from syslogd@ at Thu Sep 23 19:38:11 2021 ...
> : mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4: 9d0b4c16001d011b
>
> Message from syslogd@ at Thu Sep 23 19:38:11 2021 ...
> : mce: [Hardware Error]: TSC 0 ADDR 19e617980 MISC c01a000001000000
>
> Message from syslogd@ at Thu Sep 23 19:38:11 2021 ...
> : mce: [Hardware Error]: PROCESSOR 2:600f20 TIME 1632440315 SOCKET 0 APIC 0
> microcode 6000822
>
> -- end of report --
>
> I don't remember seeing this before : how concerned should I be ?
From the manpage:
Most errors can be corrected by the CPU by internal error correction
mechanisms. Uncorrected
errors cause machine check exceptions which may kill processes or panic
the machine. A small
number of corrected errors is usually not a cause for worry, but a
large number can indicate
future failure.
When an uncorrected machine check error happens that the kernel cannot
recover from then it
will usually panic the system. In this case when there was a warm
reset after the panic
mcelog should pick up the machine check errors after reboot. This is
not possible after a
cold reset.
If you are overclocking, try disabling it.