On Thu, Dec 6, 2018 at 10:15 AM Linus Torvalds <torva...@linux-foundation.org> wrote: > > On Wed, Dec 5, 2018 at 11:34 PM Ingo Molnar <mi...@kernel.org> wrote: > > > > Yeah, so I don't like the overly long 'SUPERVISOR' and the somewhat > > inconsistent, sporadic handling of negatives. Here's our error code bits: > > > > /* > > * Page fault error code bits: > > * > > * bit 0 == 0: no page found 1: protection fault > > * bit 1 == 0: read access 1: write access > > * bit 2 == 0: kernel-mode access 1: user-mode access > > No. Really not at all. > > Bit 2 is *not* "kernel vs user". Never has been. Never will be. > > It's a single bit that mixes up *three* different cases: > > - regular user mode access (value: 1) > > - regular CPL0 access (value: 0) > > - CPU system access (value: 0) > > and that third case really is important and relevant. And importantly, > it can happen from user space. > > In fact, these days we possibly have a fourth case: > > - kernel access using wruss (value: 1)
Indeed. > > and I'd rather see just the numbers (which you have to know to decode) > than see the simplified AND WRONG decoding of those numbers. That's why the very next line in the OOPS explains this. > > Please don't ever confuse the fault U/S bit with "user vs kernel". > It's just not true, and people should be very very aware of it now > being true. > > If you care whether a page fault happened in user mode or not, you > have to look at the register state (ie "user_mode(regs)"). The code we're arguing over was part of a patch set to fix this confusion in the page fault code for real. > > Please call the U/S bit something else than "user" or "kernel". Dunno. I kind of like following the SDM. How do you like the attached patch?
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 2ff25ad33233..f1f9e19646f5 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -610,8 +610,7 @@ static void show_ldttss(const struct desc_ptr *gdt, const char *name, u16 index) static void err_str_append(unsigned long error_code, char *buf, unsigned long mask, const char *txt) { if (error_code & mask) { - if (buf[0]) - strcat(buf, " "); + strcat(buf, " "); strcat(buf, txt); } } @@ -651,23 +650,30 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad err_txt[0] = 0; /* - * Note: length of these appended strings including the separation space and the - * zero delimiter must fit into err_txt[]. + * Note: length of these appended strings including the separation + * space and the zero delimiter must fit into err_txt[]. */ err_str_append(error_code, err_txt, X86_PF_PROT, "[PROT]" ); err_str_append(error_code, err_txt, X86_PF_WRITE, "[WRITE]"); + if ((error_code & (X86_PF_WRITE | X86_PF_INSTR)) == 0) + strcat(err_txt, " [read]"); err_str_append(error_code, err_txt, X86_PF_USER, "[USER]" ); err_str_append(error_code, err_txt, X86_PF_RSVD, "[RSVD]" ); err_str_append(error_code, err_txt, X86_PF_INSTR, "[INSTR]"); err_str_append(error_code, err_txt, X86_PF_PK, "[PK]" ); - pr_alert("#PF error: %s\n", error_code ? err_txt : "[normal kernel read fault]"); + pr_alert("#PF error: 0x%04lx%s\n", error_code, err_txt); + /* + * The X86_PF_USER bit does *not* mean the same thing as + * user_mode(regs). Make sure that the unusual cases are obvious to + * the reader. + */ if (!(error_code & X86_PF_USER) && user_mode(regs)) { struct desc_ptr idt, gdt; u16 ldtr, tr; - pr_alert("This was a system access from user code\n"); + pr_alert("NB: This was a supervisor-privileged access from user code.\n"); /* * This can happen for quite a few reasons. The more obvious @@ -692,6 +698,14 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad store_tr(tr); show_ldttss(&gdt, "TR", tr); + } else if ((error_code & X86_PF_USER) && !user_mode(regs)) { + /* + * This can happen due to WRUSS. If an ISA extension ever + * gives new instructions to do user memory access from kernel + * mode and we OOPS due to a broken fixup, we'll presumably land + * here as well. + */ + pr_alert("NB: This was a user-privileged access from kernel code.\n"); } dump_pagetable(address);