On Fri, Jan 15, 2021 at 04:27:54PM +0100, Borislav Petkov wrote:
> On Thu, Jan 14, 2021 at 04:38:17PM -0800, Tony Luck wrote:
> > Add a "mce_busy" counter so that task_work_add() is only called once
> > per faulty page in this task.
> 
> Yeah, that sentence can be removed now too.

I will update with new name "mce_count" and some details.

> > -static void queue_task_work(struct mce *m, int kill_current_task)
> > +static void queue_task_work(struct mce *m, char *msg, int 
> > kill_current_task)
> 
> So this function gets called in the user mode MCE case too:
> 
>       if ((m.cs & 3) == 3) {
> 
>               queue_task_work(&m, msg, kill_current_task);
>       }
> 
> Do we want to panic for multiple MCEs to different addresses in user
> mode?

In the user mode case we should only bump mce_count to "1" and
before task_work() gets called. It shouldn't hurt to do the
same checks. Maybe it will catch something weird - like an NMI
handler on return from the machine check doing a get_user() that
hits another machine check during the return from this machine check.

AndyL has made me extra paranoid. :-)

> > -   current->mce_addr = m->addr;
> > -   current->mce_kflags = m->kflags;
> > -   current->mce_ripv = !!(m->mcgstatus & MCG_STATUS_RIPV);
> > -   current->mce_whole_page = whole_page(m);
> > +   if (current->mce_count++ == 0) {
> > +           current->mce_addr = m->addr;
> > +           current->mce_kflags = m->kflags;
> > +           current->mce_ripv = !!(m->mcgstatus & MCG_STATUS_RIPV);
> > +           current->mce_whole_page = whole_page(m);
> > +   }
> > +
> 
>       /* Magic number should be large enough */
>
> > +   if (current->mce_count > 10)

Will add similar comment here ... and to other tests in this function
since it may not be obvious to me next year what I was thinking now :-)

> > +   if (current->mce_count > 10)
> > +           mce_panic("Too many machine checks while accessing user data", 
> > m, msg);
> > +
> > +   if (current->mce_count > 1 || (current->mce_addr >> PAGE_SHIFT) != 
> > (m->addr >> PAGE_SHIFT))
> > +           mce_panic("Machine checks to different user pages", m, msg);
> 
> Will this second part of the test expression, after the "||" ever hit?

No :-( This code is wrong. Should be "&&" not "||". Then it makes more sense.
Will fix for v4.

> In any case, what are you trying to catch with this? Two get_user() to
> different pages both catching MCEs?

Yes. Trying to catch two accesses to different pages. Need to do this
because kill_me_maybe() is only going to offline one page.

I'm not expecting that this would ever hit.  It means that calling code
took a machine check on one page and get_user() said -EFAULT. The the
code decided to access a different page *and* that other page also triggered
a machine check.

> > +   /* Do not call task_work_add() more than once */
> > +   if (current->mce_count > 1)
> > +           return;
> 
> That won't happen either, AFAICT. It'll panic above.

With the s/||/&&/ above, we can get here.
> 
> Regardless, I like how this is all confined to the MCE code and there's
> no need to touch stuff outside...

Thanks for the review.

-Tony

Reply via email to