On 02/24/2015 08:30 PM, Steven Rostedt wrote: > On Tue, 24 Feb 2015 19:51:33 +0100 > Denys Vlasenko <dvlas...@redhat.com> wrote: > >> PER_CPU_VAR(kernel_stack) was set up in a way where it points >> five stack slots below the top of stack. >> >> Presumably, it was done to avoid one "sub $5*8,%rsp" >> in syscall/sysenter code paths, where iret frame needs to be >> created by hand. >> >> Ironically, none of them benefit from this optimization, >> since all of them need to allocate additional data on stack >> (struct pt_regs), so they still have to perform subtraction. >> And ia32_sysenter_target even needs to *undo* this optimization: >> it constructs iret stack with pushes instead of movs, >> so it needs to start right at the top. >> >> This patch eliminates KERNEL_STACK_OFFSET. >> PER_CPU_VAR(kernel_stack) now points directly to top of stack. >> pt_regs allocations are adjusted to allocate iret frame as well. >> > > I always thought the KERNEL_STACK_OFFSET wasn't an optimization, but a > buffer from the real top of stack, in case we had any off by one bugs, > it wouldn't crash the system.
I was thinking about it, but it looks unlikely. Reasons: (1) ia32_sysenter_target does "addq $(KERNEL_STACK_OFFSET),%rsp" on entry before saving registers with PUSHes, this returns %rsp to the very top of kernel stack. If that is a problem (say, a NMI at this point would do bad things), it would be noticed by now. (2) even ordinary 64-bit syscall path uses IRET return at times. For one, on every execve and signal return (because they need to load a modified %rsp). With current layout, return frame for IRET lies exactly there, in those 5 stack slots "reserved" via KERNEL_STACK_OFFSET thingy. (3) There are no comments anywhere about KERNEL_STACK_OFFSET being a safety measure. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/