On Fri, Sep 16, 2016 at 08:28:44PM +0000, Luck, Tony wrote:
> > For UE recovery support, current we need mce=2 in command line
> > and also disable panic_on_oops with sysctl.
> 
> Please explain. I've never given mce=2 on command line, and have
> had my kernel recover from thousands of (injected) UE memory errors.

So frankly, that panic_on_oops doesn't make a whole lotta sense to me.

It is promoting MCEs with severity MCE_UC_SEVERITY and higher to a
panic.

So let's look at those:

        MCE_UC_SEVERITY,        - we don't do anything special in the kernel for
                                those so just as well.
        MCE_AR_SEVERITY,        - those end up in the memory failure code if
                                they're memory errors
        MCE_PANIC_SEVERITY,     - causes panic

so if anything, panic_on_oops shouldn't control the panicking behavior
as tolerant does that already:

         * Tolerant levels:
         * 0: always panic on uncorrected errors, log corrected errors
         * 1: panic or SIGBUS on uncorrected errors, log corrected errors
         * 2: SIGBUS or log uncorrected errors (if possible), log corr. errors
         * 3: never panic or SIGBUS, log all errors (for testing only)

IOW, I think that patch makes sense but please doublecheck my logic
above first.

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

Reply via email to