When a syscall returns to userspace with TIF_DISABLE_PTI_NOW set on the task, it means it is configured to disable page table isolation (PTI). In this case, returns from kernel to user will not switch the CR3, leaving it to the kernel one which already maps both user and kernel pages. This avoids a TLB flush, and saves another one on next entry.
Thanks to these changes, haproxy running under KVM went back from 12700 conn/s (without PCID) or 19700 (with PCID) to 23100 once loaded after calling prctl(), indicating that PTI has no measurable impact on this workload. Signed-off-by: Willy Tarreau <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Kees Cook <[email protected]> v3: - switched back to using a task flag v2: - use pti_disable instead of task flag --- arch/x86/entry/calling.h | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h index 19c6790..563478d 100644 --- a/arch/x86/entry/calling.h +++ b/arch/x86/entry/calling.h @@ -1,5 +1,6 @@ /* SPDX-License-Identifier: GPL-2.0 */ #include <linux/jump_label.h> +#include <asm/thread_info.h> #include <asm/unwind_hints.h> #include <asm/cpufeatures.h> #include <asm/page_types.h> @@ -229,6 +230,12 @@ .macro SWITCH_TO_USER_CR3_NOSTACK scratch_reg:req scratch_reg2:req ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI + + /* This task may be exempt from PTI */ + movq PER_CPU_VAR(current_task), \scratch_reg + btq $TIF_DISABLE_PTI_NOW, TASK_TI_flags(\scratch_reg) + jc .Lend_\@ + mov %cr3, \scratch_reg ALTERNATIVE "jmp .Lwrcr3_\@", "", X86_FEATURE_PCID -- 1.7.12.1

