Hi Mahesh,

On Fri, Jun 21, 2019 at 12:35:08PM +0530, Mahesh Jagannath Salgaonkar wrote:
On 6/21/19 6:27 AM, Santosh Sivaraj wrote:
-       blocking_notifier_call_chain(&mce_notifier_list, 0, &evt);
+       rc = blocking_notifier_call_chain(&mce_notifier_list, 0, evt);
+       if (rc & NOTIFY_STOP_MASK) {
+               evt->disposition = MCE_DISPOSITION_RECOVERED;
+               regs->msr |= MSR_RI;

What is the reason for setting MSR_RI ? I don't think this is a good
idea. MSR_RI = 0 means system got MCE interrupt when SRR0 and SRR1
contents were live and was overwritten by MCE interrupt. Hence this
interrupt is unrecoverable irrespective of whether machine check handler
recovers from it or not.

Good catch! I think this is an artifact from when I was first trying to get all this working.

Instead of setting MSR_RI, we should probably just check for it. Ie,

        if ((rc & NOTIFY_STOP_MASK) && (regs->msr & MSR_RI)) {
                evt->disposition = MCE_DISPOSITION_RECOVERED;

--
Reza Arbab

Reply via email to