On Fri, Mar 27, 2015 at 7:17 AM, Ingo Molnar <[email protected]> wrote: > > * Denys Vlasenko <[email protected]> wrote: > >> On 03/27/2015 09:11 AM, Ingo Molnar wrote: >> > >> > * Denys Vlasenko <[email protected]> wrote: >> > >> >> This change makes the check exact (no more false positives >> >> on kernel addresses). >> >> >> >> It isn't really important to be fully correct here - >> >> almost all addresses we'll ever see will be userspace ones, >> >> but OTOH it looks to be cheap enough: >> >> the new code uses two more ALU ops but preserves %rcx, >> >> allowing to not reload it from pt_regs->cx again. >> >> On disassembly level, the changes are: >> >> >> >> cmp %rcx,0x80(%rsp) -> mov 0x80(%rsp),%r11; cmp %rcx,%r11 >> >> shr $0x2f,%rcx -> shl $0x10,%rcx; sar $0x10,%rcx; cmp %rcx,%r11 >> >> mov 0x58(%rsp),%rcx -> (eliminated) >> >> >> >> Signed-off-by: Denys Vlasenko <[email protected]> >> >> CC: Borislav Petkov <[email protected]> >> >> CC: [email protected] >> >> CC: [email protected] >> >> --- >> >> >> >> Andy, I'd undecided myself on the merits of doing this. >> >> If you like it, feel free to take it in your tree. >> >> I trimmed CC list to not bother too many people with this trivial >> >> and quite possibly "useless churn"-class change. >> >> >> >> arch/x86/kernel/entry_64.S | 23 ++++++++++++----------- >> >> 1 file changed, 12 insertions(+), 11 deletions(-) >> >> >> >> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S >> >> index bf9afad..a36d04d 100644 >> >> --- a/arch/x86/kernel/entry_64.S >> >> +++ b/arch/x86/kernel/entry_64.S >> >> @@ -688,26 +688,27 @@ retint_swapgs: /* return to >> >> user-space */ >> >> * a completely clean 64-bit userspace context. >> >> */ >> >> movq RCX(%rsp),%rcx >> >> - cmpq %rcx,RIP(%rsp) /* RCX == RIP */ >> >> + movq RIP(%rsp),%r11 >> >> + cmpq %rcx,%r11 /* RCX == RIP */ >> >> jne opportunistic_sysret_failed >> > >> > Btw., in the normal syscall entry path, RIP(%rsp) == RCX(%rsp), >> > because we set up pt_regs like that - and at this point RIP/RCX is >> > guaranteed to be canonical, right? >> > >> > So if there's a mismatch generated, it's the kernel's doing. >> >> This is an optimization on IRET exit code path. >> >> We go here if we know that pt_regs can be modified by .e.g. ptrace. >> >> I think we also go here even on interrupt return. > > Yeah, missed that, this would kill any flag based approach. > >> (Granted, chances that RCX was the same as RIP at the moment of >> interrupt are slim, but we still would check that and (ab)use SYSRET >> if it looks like it'll work). > > Btw., there's a neat trick we could do: in the HLT, MWAIT and > ACPI-idle code we could attempt to set up RCX to match RIP, to trigger > this optimization in the common 'irq interrupted the idle task' case?
sysret only returns to CPL3. -- Brian Gerst -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/

