On Mon, Jan 25, 2016 at 01:34:13PM -0800, Andy Lutomirski wrote: > Signals are always delivered to 64-bit tasks with CS set to a long > mode segment. In long mode, SS doesn't matter as long as it's a > present writable segment. > > If SS starts out invalid (this can happen if the signal was caused > by an IRET fault or was delivered on the way out of set_thread_area > or modify_ldt), then IRET to the signal handler can fail, eventually > killing the task. > > The straightforward fix would be to simply reset SS when delivering > a signal. That breaks DOSEMU, though: 64-bit builds of DOSEMU rely > on SS being set to the faulting SS when signals are delivered. > > As a compromise, this patch leaves SS alone so long as it's valid. > > The net effect should be that the behavior of successfully delivered > signals is unchanged. Some signals that would previously have > failed to be delivered will now be delivered successfully. > > This has no effect for x32 or 32-bit tasks: their signal handlers > were already called with SS == __USER_DS. > > (On Xen, there's a slight hole: if a task sets SS to a writable > *kernel* data segment, then we will fail to identify it as invalid > and we'll still kill the task. If anyone cares, this could be fixed > with a new paravirt hook.) > > Signed-off-by: Andy Lutomirski <l...@kernel.org> > --- > arch/x86/include/asm/desc_defs.h | 23 ++++++++++++++++++ > arch/x86/kernel/signal.c | 51 > ++++++++++++++++++++++++++++++++++++++-- > 2 files changed, 72 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/include/asm/desc_defs.h > b/arch/x86/include/asm/desc_defs.h > index 278441f39856..00971705a16d 100644 > --- a/arch/x86/include/asm/desc_defs.h > +++ b/arch/x86/include/asm/desc_defs.h > @@ -98,4 +98,27 @@ struct desc_ptr { > > #endif /* !__ASSEMBLY__ */ > > +/* Access rights as returned by LAR */ > +#define AR_TYPE_RODATA (0 * (1 << 9)) > +#define AR_TYPE_RWDATA (1 * (1 << 9)) > +#define AR_TYPE_RODATA_EXPDOWN (2 * (1 << 9)) > +#define AR_TYPE_RWDATA_EXPDOWN (3 * (1 << 9)) > +#define AR_TYPE_XOCODE (4 * (1 << 9)) > +#define AR_TYPE_XRCODE (5 * (1 << 9)) > +#define AR_TYPE_XOCODE_CONF (6 * (1 << 9)) > +#define AR_TYPE_XRCODE_CONF (7 * (1 << 9)) > +#define AR_TYPE_MASK (7 * (1 << 9)) > + > +#define AR_DPL0 (0 * (1 << 13)) > +#define AR_DPL3 (3 * (1 << 13)) > +#define AR_DPL_MASK (3 * (1 << 13)) > + > +#define AR_A (1 << 8) /* A means "accessed" */ > +#define AR_S (1 << 12) /* S means "not system" */
Ah, with "not system" you want to say that S=0b makes it a system descriptor and S=1b a user. I think the SDM calls it more descriptively the "S (descriptor type) flag" while the APM calls it simply the S-field or S-bit. I like "S (descriptor type) flag" more than "not system". :) > +#define AR_P (1 << 15) /* P means "present" */ > +#define AR_AVL (1 << 20) /* AVL does nothing */ AVL = AVaiLable to software > +#define AR_L (1 << 21) /* L means "long mode" */ > +#define AR_DB (1 << 22) /* D or B, depending on > type */ > +#define AR_G (1 << 23) /* G means "limit in pages" */ Please use the names from the processor manuals. G is the Granularity bit. "limit in pages" is only clear to the people who have already read the Granularity bit description. :-) > #endif /* _ASM_X86_DESC_DEFS_H */ > diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c > index cb6282c3638f..bb3e4208d90d 100644 > --- a/arch/x86/kernel/signal.c > +++ b/arch/x86/kernel/signal.c > @@ -61,6 +61,35 @@ > regs->seg = GET_SEG(seg) | 3; \ > } while (0) > > +#ifdef CONFIG_X86_64 You already have an #else /* !CONFIG_X86_32 */ block above the 64-bit version of __setup_rt_frame(). Just put force_valid_ss() there without that additional ifdef. That file's ifdeffery is beyond any readability anyway. > +/* > + * If regs->ss will cause an IRET fault, change it. Otherwise leave it > + * alone. Using this generally makes no sense unless > + * user_64bit_mode(regs) would return true. > + */ > +static void force_valid_ss(struct pt_regs *regs) > +{ > + u32 ar; > + asm volatile ("lar %[old_ss], %[ar]\n\t" > + "jz 1f\n\t" /* If invalid: */ > + "xorl %[ar], %[ar]\n\t" /* set ar = 0 */ > + "1:" > + : [ar] "=r" (ar) > + : [old_ss] "rm" ((u16)regs->ss)); > + > + /* > + * For a valid 64-bit user context, we need DPL 3, type > + * read-write data or read-write exp-down data, and S and P > + * set. We can't use VERW because VERW doesn't check the > + * P bit. > + */ > + ar &= AR_DPL_MASK | AR_S | AR_P | AR_TYPE_MASK; > + if (ar != (AR_DPL3 | AR_S | AR_P | AR_TYPE_RWDATA) && > + ar != (AR_DPL3 | AR_S | AR_P | AR_TYPE_RWDATA_EXPDOWN)) > + regs->ss = __USER_DS; > +} > +#endif > + > int restore_sigcontext(struct pt_regs *regs, struct sigcontext __user *sc) > { > unsigned long buf_val; > @@ -459,10 +488,28 @@ static int __setup_rt_frame(int sig, struct ksignal > *ksig, > > regs->sp = (unsigned long)frame; > > - /* Set up the CS register to run signal handlers in 64-bit mode, > - even if the handler happens to be interrupting 32-bit code. */ > + /* > + * Set up the CS and SS registers to run signal handlers in > + * 64-bit mode, even if the handler happens to be interrupting > + * 32-bit or 16-bit code. > + * > + * SS is subtle. In 64-bit mode, we don't need any particular > + * SS descriptor, but we do need SS to be valid. It's possible > + * that the old SS is entirely bogus -- this can happen if the > + * signal we're trying to deliver is #GP or #SS caused by a bad > + * SS value. We also have a compatbility issue here: DOSEMU > + * relies on the contents of the SS register indicating the > + * SS value at the time of the signal, even though that code in > + * DOSEMU predates sigreturn's ability to restore SS. (DOSEMU > + * avoids relying on sigreturn to restore SS; instead it uses > + * a trampoline.) So we do our best: if the old SS was valid, > + * we keep it. Otherwise we replace it. > + */ > regs->cs = __USER_CS; > > + if (unlikely(regs->ss != __USER_DS)) So this is fast path AFAICT and from adding a gdb breakpoint here. I guess we can't do the opt-in behavior and patch it out when users don't want to run dosemu. Or maybe we could add a CONFIG_CHECK_OLD_SS which is default y and people can disable it... so an opt-out behavior :) Hmmm... -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply.