On Thu, May 22, 2014 at 6:25 AM, Jiri Kosina <jkos...@suse.cz> wrote: > > Yeah, but it tries to send SIGBUS from MCE context. And if MCE triggered > at the time the CPU was already holding sighand->siglock for that > particular task, it'll deadlock against itself.
Don't worry too much about the MCE's. The hardware is f*cking broken, and nobody sane ever thought that synchronous MCE's were a good idea. Proof: look at Itanium. The truly nonmaskable synchronous MCE's are a fatal error. It's that simple. Anybody who thinks anything else is simply wrong, and has probably talked to too many hardware engineers that don't actually understand the bigger picture. Sane hardware handles anything that *can* be handled in hardware, and then reports (later) to software about the errors with a regular non-critical MCE that doesn't punch through NMI or even regular interrupt disabling. So the true "MCE punches through even NMI protection" case is relegated purely to the "hardware is broken and needs to be replaced" situation, and our only worry as kernel people is to try to be as graceful as possible about it - but that "as graceful as possible" does *not* include bending over and worrying about random possible deadlocks or other crazy situations. It's purely a "best effort" kind of thing where we try to do whatever logging etc that is easy to do. Seriously. If an NMI is interrupted by an MCE, you might as well consider the machine dead. Don't worry about it. We may or may not recover, but it is *not* our problem. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/