On Fri, Sep 23, 2016 at 12:43 AM, Jann Horn <[email protected]> wrote: > On Thu, Sep 22, 2016 at 03:44:37PM -0700, Andy Lutomirski wrote: >> On Fri, Sep 16, 2016 at 7:00 PM, Jann Horn <[email protected]> wrote: >> > On Tue, Sep 13, 2016 at 02:29:29PM -0700, Andy Lutomirski wrote: >> >> This will prevent a crash if get_wchan() runs after the task stack >> >> is freed. >> > >> > I think I found some more stuff. Have a look at KSTK_EIP() and KSTK_ESP(), >> > I think >> > they read from the saved userspace registers area at the top of the kernel >> > stack? >> > >> > Used on remote processes in: >> > vma_is_stack_for_task() (via /proc/$pid/maps) >> >> This isn't used in /proc/$pid/maps -- it's only used in >> /proc/$pid/task/$tid/maps. I wonder if anyone actually cares about it >> -- it certainly won't work reliably. >> >> I could pin the stack in vma_is_stack_for_task, but it seems >> potentially better to me to change it to vma_is_stack_for_current() >> and remove the offending caller in /proc, replacing it with "return >> 0". Thoughts? > > I just scrolled through the debian codesearch results for "\[stack\]" - > there seem to only be 105 across all of debian's packages, many of them > duplicates - and I didn't see any that looked like they used the tid map. > So I think this might work. > > ( https://codesearch.debian.net/search?q=%22%5C%5Bstack%5C%5D%22 ) > > >> > do_task_stat() (/proc/$pid/stat) >> >> Like this: >> >> mm = get_task_mm(task); >> if (mm) { >> vsize = task_vsize(mm); >> if (permitted) { >> eip = KSTK_EIP(task); >> esp = KSTK_ESP(task); >> } >> } >> >> Can we just delete this outright? It seems somewhere between mostly >> and entirely useless, and it also seems dangerous. Until very >> recently, on x86_64, this would have been a potential info leak, as >> SYSCALL followed closely by a hardware interrupt would cause *kernel* >> values to land in task_pt_regs(). I don't even want to think about >> what this code does if the task is in vm86 mode. I wouldn't be at all >> surprised if non-x86 architectures have all kinds of interesting >> thinks happen if you do this to a task that isn't running normal >> non-atomic kernel code at the time. >> >> I would advocate for unconditionally returning zeros in these two stat >> fields. > > I'd like that a lot. > > I guess the two things that might theoretically use it are ptrace users > and (very theoretically) sampling profiling stuff or so? > > In gdb, the only code I can find that reads this is in gdb/linux-nat.c, but > it's behind an "#ifdef 0": > > #if 0 /* Don't know how architecture-dependent the rest is... > Anyway the signal bitmap info is available from "status". */ > if (fscanf (procfile, "%lu ", <mp) > 0) /* FIXME arch? */ > printf_filtered (_("Kernel stack pointer: 0x%lx\n"), ltmp); > if (fscanf (procfile, "%lu ", <mp) > 0) /* FIXME arch? */ > printf_filtered (_("Kernel instr pointer: 0x%lx\n"), ltmp); > [...] > > strace and ltrace don't seem to be using it.
Does CRIU use this? I wouldn't expect so, since they're using ptrace, IIUC, to freeze/restore. -Kees -- Kees Cook Nexus Security

