On Thu, Dec 6, 2018 at 10:15 AM Linus Torvalds
<torva...@linux-foundation.org> wrote:
>
> On Wed, Dec 5, 2018 at 11:34 PM Ingo Molnar <mi...@kernel.org> wrote:
> >
> > Yeah, so I don't like the overly long 'SUPERVISOR' and the somewhat
> > inconsistent, sporadic handling of negatives. Here's our error code bits:
> >
> > /*
> >  * Page fault error code bits:
> >  *
> >  *   bit 0 ==    0: no page found       1: protection fault
> >  *   bit 1 ==    0: read access         1: write access
> >  *   bit 2 ==    0: kernel-mode access  1: user-mode access
>
> No. Really not at all.
>
> Bit 2 is *not* "kernel vs user".  Never has been. Never will be.
>
> It's a single bit that mixes up *three* different cases:
>
>  - regular user mode access (value: 1)
>
>  - regular CPL0 access (value: 0)
>
>  - CPU system access (value: 0)
>
> and that third case really is important and relevant. And importantly,
> it can happen from user space.
>
> In fact, these days we possibly have a fourth case:
>
>  - kernel access using wruss (value: 1)

Indeed.

>
> and I'd rather see just the numbers (which you have to know to decode)
> than see the simplified AND WRONG decoding of those numbers.

That's why the very next line in the OOPS explains this.

>
> Please don't ever confuse the fault U/S bit with "user vs kernel".
> It's just not true, and people should be very very aware of it now
> being true.
>
> If you care whether a page fault happened in user mode or not, you
> have to look at the register state (ie "user_mode(regs)").

The code we're arguing over was part of a patch set to fix this
confusion in the page fault code for real.

>
> Please call the U/S bit something else than "user" or "kernel".

Dunno.  I kind of like following the SDM.

How do you like the attached patch?
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 2ff25ad33233..f1f9e19646f5 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -610,8 +610,7 @@ static void show_ldttss(const struct desc_ptr *gdt, const char *name, u16 index)
 static void err_str_append(unsigned long error_code, char *buf, unsigned long mask, const char *txt)
 {
 	if (error_code & mask) {
-		if (buf[0])
-			strcat(buf, " ");
+		strcat(buf, " ");
 		strcat(buf, txt);
 	}
 }
@@ -651,23 +650,30 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad
 	err_txt[0] = 0;
 
 	/*
-	 * Note: length of these appended strings including the separation space and the
-	 * zero delimiter must fit into err_txt[].
+	 * Note: length of these appended strings including the separation
+	 * space and the zero delimiter must fit into err_txt[].
 	 */
 	err_str_append(error_code, err_txt, X86_PF_PROT,  "[PROT]" );
 	err_str_append(error_code, err_txt, X86_PF_WRITE, "[WRITE]");
+	if ((error_code & (X86_PF_WRITE | X86_PF_INSTR)) == 0)
+		strcat(err_txt, " [read]");
 	err_str_append(error_code, err_txt, X86_PF_USER,  "[USER]" );
 	err_str_append(error_code, err_txt, X86_PF_RSVD,  "[RSVD]" );
 	err_str_append(error_code, err_txt, X86_PF_INSTR, "[INSTR]");
 	err_str_append(error_code, err_txt, X86_PF_PK,    "[PK]"   );
 
-	pr_alert("#PF error: %s\n", error_code ? err_txt : "[normal kernel read fault]");
+	pr_alert("#PF error: 0x%04lx%s\n", error_code, err_txt);
 
+	/*
+	 * The X86_PF_USER bit does *not* mean the same thing as
+	 * user_mode(regs).  Make sure that the unusual cases are obvious to
+	 * the reader.
+	 */
 	if (!(error_code & X86_PF_USER) && user_mode(regs)) {
 		struct desc_ptr idt, gdt;
 		u16 ldtr, tr;
 
-		pr_alert("This was a system access from user code\n");
+		pr_alert("NB: This was a supervisor-privileged access from user code.\n");
 
 		/*
 		 * This can happen for quite a few reasons.  The more obvious
@@ -692,6 +698,14 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad
 
 		store_tr(tr);
 		show_ldttss(&gdt, "TR", tr);
+	} else if ((error_code & X86_PF_USER) && !user_mode(regs)) {
+		/*
+		 * This can happen due to WRUSS.  If an ISA extension ever
+		 * gives new instructions to do user memory access from kernel
+		 * mode and we OOPS due to a broken fixup, we'll presumably land
+		 * here as well.
+		 */
+		pr_alert("NB: This was a user-privileged access from kernel code.\n");
 	}
 
 	dump_pagetable(address);

Reply via email to