On Tue, Aug 25, 2020 at 12:32 PM Andy Lutomirski <l...@kernel.org> wrote: > > On Tue, Aug 25, 2020 at 11:50 AM Kyle Huey <m...@kylehuey.com> wrote: > > > > On Tue, Aug 25, 2020 at 10:31 AM Kyle Huey <m...@kylehuey.com> wrote: > > > > > > On Tue, Aug 25, 2020 at 9:46 AM Andy Lutomirski <l...@kernel.org> wrote: > > > > > > > > On Tue, Aug 25, 2020 at 9:32 AM Kyle Huey <m...@kylehuey.com> wrote: > > > > > > > > > > On Tue, Aug 25, 2020 at 9:12 AM Andy Lutomirski <l...@amacapital.net> > > > > > wrote: > > > > > > I don’t like this at all. Your behavior really shouldn’t depend on > > > > > > whether the new instructions are available. Also, some day I would > > > > > > like to change Linux to have the new behavior even if FSGSBASE > > > > > > instructions are not available, and this will break rr again. (The > > > > > > current !FSGSBASE behavior is an ugly optimization of dubious value. > > > > > > I would not go so far as to describe it as correct.) > > > > > > > > > > Ok. > > > > > > > > > > > I would suggest you do one of the following things: > > > > > > > > > > > > 1. Use int $0x80 directly to load 32-bit regs into a child. This > > > > > > might dramatically simplify your code and should just do the right > > > > > > thing. > > > > > > > > > > I don't know what that means. > > > > > > > > This is untested, but what I mean is: > > > > > > > > static int ptrace32(int req, pid_t pid, int addr, int data) { > > > > int ret; > > > > /* new enough kernels won't clobber r8, etc. */ > > > > asm volatile ("int $0x80" : "=a" (ret) : "a" (26 /* ptrace */), "b" > > > > (req), "c" (pid), "d" (addr), "S" (data) : "flags", "r8", "r9", "r10", > > > > "r11"); > > > > return ret; > > > > } > > > > > > > > with a handful of caveats: > > > > > > > > - This won't compile with -fPIC, I think. Instead you'll need to > > > > write a little bit of asm to set up and restore ebx yourself. gcc is > > > > silly like this. > > > > > > > > - Note that addr is an int. You'll need to mmap(..., MAP_32BIT, ...) > > > > to get a buffer that can be pointed to with an int. > > > > > > > > The advantage is that this should work on all kernels that support > > > > 32-bit mode at all. > > > > > > > > > > > > > > > 2. Something like your patch but make it unconditional. > > > > > > > > > > > > 3. Ask for, and receive, real kernel support for setting FS and GS > > > > > > in > > > > > > the way that 32-bit code expects. > > > > > > > > > > I think the easiest way forward for us would be a PTRACE_GET/SETREGSET > > > > > like operation that operates on the regsets according to the > > > > > *tracee*'s bitness (rather than the tracer, as it works currently). > > > > > Does that sound workable? > > > > > > > > > > > > > Strictly speaking, on Linux, there is no unified concept of a task's > > > > bitness, so "set all these registers according to the target's > > > > bitness" is not well defined. We could easily give you a > > > > PTRACE_SETREGS_X86_32, etc, though. > > > > > > In the process of responding to this I spent some time doing code > > > inspection and discovered a subtlety in the ptrace API that I was > > > previously unaware of. PTRACE_GET/SETREGS use the regset views > > > corresponding to the tracer but PTRACE_GET/SETREGSET use the regset > > > views corresponding to the tracee. This means it is possible for us > > > today to set FS/GS "the old way" with a 64 bit tracer/32 bit tracee > > > combo, as long as we use PTRACE_SETREGSET with NT_PRSTATUS instead of > > > PTRACE_SETREGS. > > > > Alright I reverted the previous changes and switched us to use > > PTRACE_SETREGSET with NT_PRSTATUS[0] and our 32 bit tests pass again. > > I assume this behavior will remain unchanged indefinitely even when > > the fs/gsbase manipulation instructions are not available since the 32 > > bit user_regs_struct can't grow? > > Correct. > > I think it would be reasonable to add new, less erratic PTRACE_SETxyz > modes, but the old ones will stay as is. > > Strictly speaking, the issue you had wasn't a ptrace change. It was a > context switching change. Before the change, you were programming > garbage into the tracee state, but the context switch code wasn't > capable of restoring the garbage correctly and instead happened to > restore the state you actually wanted. With the new code, the context > switch actually worked correctly (for some value of correctly), and > the tracee crashed. Not that this distinction really matters.
Yes. We've been feeding the kernel trash for years. You were just letting us get away with it before ;) - Kyle