On Wed, May 21, 2014 at 3:01 PM, Luck, Tony <tony.l...@intel.com> wrote: >> But sending signals from #MC context is definitely a bad idea. I think >> we had addressed this with irq_work at some point but my memory is very >> hazy. > > We added code for recoverable errors to get out of the MC context > before trying to lookup the page and send the signal. Bottom of > do_machine_check(): > > if (cfg->tolerant < 3) { > if (no_way_out) > mce_panic("Fatal machine check on current CPU", &m, > msg); > if (worst == MCE_AR_SEVERITY) { > /* schedule action before return to userland */ > mce_save_info(m.addr, m.mcgstatus & MCG_STATUS_RIPV); > set_thread_flag(TIF_MCE_NOTIFY); > } else if (kill_it) { > force_sig(SIGBUS, current); > } > } > > That TIF_MCE_NOTIFY prevents the return to user mode, and we end up in > mce_notify_process().
Why is this necessary? If the MCE hit kernel code, then we're going to die anyway. If the MCE hit user code, then we should be in a completely sensible context and we can just send the signal. --Andy > > The "force_sig()" there is legacy code - and perhaps should just move off to > mce_notify_process() > too (need to save "worst" so it will know what to do). > > -Tony -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/