On 11/28/2015 11:01 AM, Dan Johansson wrote: > I have started noticing the following messages in the dmesg output (and > in the log-files) on my Gentoo rig: > > [46545.779803] [Hardware Error]: Corrected error, no action required. > [46545.779984] [Hardware Error]: CPU:3 (15:2:0) > MC2_STATUS[Over|CE|MiscV|-|AddrV|-|-|CECC]: 0xdc2540f000040136 > [46545.780434] [Hardware Error]: MC2 Error Address: 0x00000002cc215138 > [46545.780605] [Hardware Error]: MC2 Error: Fill ECC error on data fills. > [46545.783764] [Hardware Error]: cache level: L2, tx: DATA, mem-tx: DRD > [46545.784088] mce: [Hardware Error]: Machine check events logged
Are you using ECC memory? I saw the same errors when I just finished building a machine that had some faulty ECC DIMMs installed. > I have been running memtest for some time (~100h) and have not gotten > any error message - so I am suspecting that this is a CPU problem. Am I > correct? In my case memtest didn't find any errors after a night of running either, but when I'd boot Gentoo the errors would occur more frequently the longer I was running or the more packages I had compiled. I think the version of memtest I was running didn't take into account error corrections, so for memtest every test succeeded even though the memory had to use error corrections to make sure everything was read/written properly. > If it was just these error-messages I would not be that worried, but I > have started to get a lot of "hangers" on this rig when compiling larger > packages. Could there be a relation to the error-messages? What I'd try to do is find the DIMM that's causing these errors and see how your machine runs without it installed. I used EDAC [0] and edac-utils [1] to find my faulty DIMMs. - Boy [0] https://www.kernel.org/doc/Documentation/edac.txt [1] https://packages.gentoo.org/package/sys-apps/edac-utils
0x729527E4.asc
Description: application/pgp-keys
signature.asc
Description: OpenPGP digital signature