On Wed, May 22, 2019 at 10:22 AM mark <m.r...@5-cent.us> wrote:

> It seems unlikely. It's a 4U server, with 36 disks (and the dual root
> disks), in a machine room, and ipmitool sel list shows nada, nor are there
> any warnings, as I've seen on other systems occasionally, that the CPU is
> overheating, and is being throttled.


If this is a recent sever (ivybridge/haswell/broadwell) then I’ve seen the
“edac” kernel module prevent SEL from showing faults when a
MCE/machine-check-exception occurs. Disable edac and poof server stops
crashing and/or SEL shows something useful(ECC/MCE). Did you check
/var/log/mcelog?
_______________________________________________
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos

Reply via email to