On Mon, Mar 23, 2015 at 3:56 AM, Ingo Molnar <[email protected]> wrote: > > * Brian Gerst <[email protected]> wrote: > >> Both the execve and sigreturn family of syscalls have the ability to change >> registers in ways that may not be compatabile with the syscall path they >> were called from. In particular, sysret and sysexit can't handle non-default >> %cs and %ss, and some bits in eflags. These syscalls have stubs that are >> hardcoded to jump to the iret path, and not return to the original syscall >> path. Commit 76f5df43cab5e765c0bd42289103e8f625813ae1 (Always allocate a >> complete "struct pt_regs" on the kernel stack) recently changed this for >> some 32-bit compat syscalls, but introduced a bug where execve from a 32-bit >> program to a 64-bit program would fail because it still returned via sysretl. >> This caused Wine to fail when built for both 32-bit and 64-bit. >> >> This patch sets TIF_NOTIFY_RESUME for execve and sigreturn so that the iret >> path is always taken on exit to userspace. >> >> Signed-off-by: Brian Gerst <[email protected]> >> Cc: Ingo Molnar <[email protected]> >> Cc: Denys Vlasenko <[email protected]> >> Cc: Andy Lutomirski <[email protected]> >> Cc: Borislav Petkov <[email protected]> >> Cc: H. Peter Anvin <[email protected]> >> Cc: Linus Torvalds <[email protected]> >> --- >> arch/x86/ia32/ia32_signal.c | 2 ++ >> arch/x86/include/asm/ptrace.h | 2 +- >> arch/x86/include/asm/thread_info.h | 7 +++++++ >> arch/x86/kernel/process_32.c | 6 +----- >> arch/x86/kernel/process_64.c | 1 + >> arch/x86/kernel/signal.c | 2 ++ >> 6 files changed, 14 insertions(+), 6 deletions(-) > > Applied the fix to tip:x86/asm, thanks Brian! > >> + >> +/* >> + * force syscall return via iret by making it look as if there was >> + * some work pending. >> +*/ >> +#define force_iret() set_thread_flag(TIF_NOTIFY_RESUME) > > I extended this comment to: > > /* > * Force syscall return via IRET by making it look as if there was > * some work pending. IRET is our most capable (but slowest) syscall > * return path, which is able to restore modified SS, CS and certain > * EFLAGS values that other (fast) syscall return instructions > * are not able to restore properly. > */ > #define force_iret() set_thread_flag(TIF_NOTIFY_RESUME) > > Just to preserve the underlying reason for force_iret() for the future > and such. > > Btw., it might be a worthwile optimization to detect non-standard SS, > CS and EFLAGS values and only force_iret() in that case, that will > speed up 99.9999% of execve() and sigreturn() syscalls and only force > the 'weird' process startup modes into the slow return path.
sysret/sysexit also can't restore rcx and r11/rdx. This would not work for execve, since it sets those registers to zero. It could possibly work for sigreturn if the signal interrupted a syscall. We already have the opportunistic sysret code for 64-bit returns. -- Brian Gerst -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/

