Sounds like it's been pinned down then. Just confirming this: On Tue, Dec 26, 2017 at 06:16:37PM -0800, Linus Torvalds wrote: > On Tue, Dec 26, 2017 at 3:19 PM, Alexandru Chirvasitu > <achirva...@gmail.com> wrote: > > > > I went back to the initial problematic commit e802a51 and modified it as > > you suggest: > > Thank you. > > > This did not work out for me, but now it fails differently. Both > > (kexec -l + kexec -e) and (kexec -p + echo c > /proc/sysrq-trigger) > > end in call traces and freezes. > > > > It does seem to be tied to idt_invalidate. One of the last things I > > see on the screen (which is ends up frozen with the computer inactive) > > is > > > > EIP: idt_invalidate+0x6/0x40 SS:ESP: 0068:f6c47cd0 > > Yes, interesting, it's the stack canary load access there: > > mov %gs:0x14,%edx > > that traps. > > And that actually makes a lot of sense: the load_segments() call just > above has rloaded all segments with __KERNEL_DS. > > So while the stack canary access *intends* to load it from the magic > stack canary segment (offset 0x14), we've just reset all segments to > the standard zero-based full-sized ones, and obviously that will take > a page fault at 0x14. > > And the reason you now actually *see* the page fault is that we > haven't completely buggered the CPU state now, so the trap handler > actually works. With the GDT reset before, it used to take that same > trap, but now the trap handler itself would fault, and cause a triple > fault - which resets the machine. > > So it wasn't actually tracing, it was the stack canary all along. So > at least it's truly root-caused now. > > But the fix is the same: we just can't afford to do any function calls. > > Alternatively, we should just fix that insane "load_segments()". I'm > not sure why the code insists on reloading the segments in the first > place. > > So you could try just to remove the "load_segments()" line entirely. >
This works. The kernel that was showing me the page fault, with the load_segments line removed from machine_kexec_32, has no issues now. This is the diff to that prior commit: -------------------------------------------------------- remove load_segments line from kexec diff --git a/arch/x86/kernel/machine_kexec_32.c b/arch/x86/kernel/machine_kexec_32.c index 71bd3c0..7d5e655 100644 --- a/arch/x86/kernel/machine_kexec_32.c +++ b/arch/x86/kernel/machine_kexec_32.c @@ -218,17 +218,6 @@ void machine_kexec(struct kimage *image) << PAGE_SHIFT); /* - * The segment registers are funny things, they have both a - * visible and an invisible part. Whenever the visible part is - * set to a specific selector, the invisible part is loaded - * with from a table in memory. At no other time is the - * descriptor table in memory accessed. - * - * I take advantage of this here by force loading the - * segments, before I zap the gdt with an invalid value. - */ - load_segments(); - /* * The gdt & idt are now invalid. * If you want to load them you must set up your own idt & gdt. */ -------------------------------------------------------- > Thanks for spending the time testing things out, > Well, it sure as hell is nothing if not instructive. My pleasure. Alex