> When CR4.MCE=0b and an MCE happens, it will shutdown the system, at
> least on Intel, according to Tony

I checked with the architects ... and I was right. If you clear CR4.MCE you'll 
still
see the machine check - and you'll pull the big system reset lever.

If you think the other cpus can survive the reset - then the right thing to do 
is to
have any offline cpus that show up in the machine check handler just clear 
MCG_STATUS
and return:

do_machine_check()
{
        /* offline cpus may show up for the party - but don't need to do 
anything here - send them back home */
        if (!(cpu_online(smp_processor_id())) {
                mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
                return;
        }

If we are crashing because of a machine check - I wonder how useful it is to 
run kdump().  There are a very
small set of ways that you can induce a machine check from program action - 
normally the problem is that
something bad happened in the h/w ... a kdump will just fill your disk and 
waste your time looking at what
the s/w was dong when the machine check happened.

-Tony
N�����r��y����b�X��ǧv�^�)޺{.n�+����{����zX����ܨ}���Ơz�&j:+v�������zZ+��+zf���h���~����i���z��w���?�����&�)ߢf��^jǫy�m��@A�a���
0��h���i

Reply via email to