On Fri, Apr 5, 2019 at 11:38 PM kernel test robot <l...@intel.com> wrote: > > Greetings, > > 0day kernel testing robot got the below dmesg and the first bad commit is > > https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git > WIP.x86/stackguards > > commit 8b275b3754465d502d393f8ae8dd355b7067e73f > Author: Andy Lutomirski <l...@kernel.org> > AuthorDate: Fri Jul 13 19:01:23 2018 -0700 > Commit: Thomas Gleixner <t...@linutronix.de> > CommitDate: Fri Apr 5 17:04:10 2019 +0200 > > x86/irq/64: Remap the IRQ stack with guard pages > > The IRQ stack lives in percpu space, so an IRQ handler that overflows it > will overwrite other data structures. > > Use vmap() to remap the IRQ stack so that it will have the usual guard > pages that vmap/vmalloc allocations have. With this the kernel will panic > immediately on an IRQ stack overflow. > > [ tglx: Move the map code to a proper place and invoke it only when a CPU > is about to be brought online. No point in installing the map at > early boot for all possible CPUs. Fail the CPU bringup if the vmap > fails as done for all other preparatory stages in cpu hotplug. ] > > Signed-off-by: Andy Lutomirski <l...@kernel.org> > Signed-off-by: Thomas Gleixner <t...@linutronix.de>
I haven't spotted the actual bug yet, but the faulting instruction is: 2a: 65 8b 35 09 ca 75 63 mov %gs:*0x6375ca09(%rip),%esi # 0x6375ca3a <-- trapping instruction This seems to be faulting just above the top of the stack (the thing in RSP), so I suspect that there is some path that is shoving the remapped value into GSBASE, which is wrong. Also, FWIW, there was some reason that I initialized all the virtual mappings for all possible CPUs early. I don't remember what it was, and it may not have been a good reason, but I put at least some nonzero amount of thought into it :) --Andy