> When CR4.MCE=0b and an MCE happens, it will shutdown the system, at > least on Intel, according to Tony
I checked with the architects ... and I was right. If you clear CR4.MCE you'll still see the machine check - and you'll pull the big system reset lever. If you think the other cpus can survive the reset - then the right thing to do is to have any offline cpus that show up in the machine check handler just clear MCG_STATUS and return: do_machine_check() { /* offline cpus may show up for the party - but don't need to do anything here - send them back home */ if (!(cpu_online(smp_processor_id())) { mce_wrmsrl(MSR_IA32_MCG_STATUS, 0); return; } If we are crashing because of a machine check - I wonder how useful it is to run kdump(). There are a very small set of ways that you can induce a machine check from program action - normally the problem is that something bad happened in the h/w ... a kdump will just fill your disk and waste your time looking at what the s/w was dong when the machine check happened. -Tony N�����r��y����b�X��ǧv�^�){.n�+����{����zX����ܨ}���Ơz�&j:+v�������zZ+��+zf���h���~����i���z��w���?�����&�)ߢf��^jǫy�m��@A�a��� 0��h���i