On Mon, Apr 08, 2019 at 12:01:00AM -0700, John Nemeth wrote: > On Apr 7, 9:48pm, "Aaron J. Grier" wrote: > > On Wed, Mar 20, 2019 at 11:22:13AM -0700, John Nemeth wrote:
> > > (XEN) Bank 4: 945a4000fd080813 at ef3581180 > > > (XEN) MCE: polling routine found correctable error. Use mcelog to parse > > > above e > > > rror output. > > [...] > > > In any event, if I'm reading the above correctly, I believe > > > that it is telling that there is bad memory? > > > > which CPU manufacturer and model is this? memory is just one of > > many possibilities which can generate machine check events. > > cpu0: "AMD Opteron(tm) Processor 6386 SE " > cpu0: AMD Family 15h (686-class) > cpu0: family 0x15 model 0x2 stepping 0 (id 0x600f20) https://www.amd.com/system/files/TechDocs/42301_15h_Mod_00h-0Fh_BKDG.pdf according to cursory register decode based on the above document, that it does look like it could be an ECC-correctable memory error. there's another MSR that keeps a count of how many DRAM errors have been detected -- too bad NetBSD doesn't have an MSR driver. ;) -- Aaron J. Grier | "Not your ordinary poofy goof." | agr...@poofygoof.com "The price of reliability is the pursuit of the utmost simplicity. It is a price which the very rich find most hard to pay." -- Tony Hoare