Hi, On Mon, Feb 11, 2019 at 11:58 AM Dave Martin <dave.mar...@arm.com> wrote: > > On Mon, Feb 11, 2019 at 09:27:11AM -0800, Doug Anderson wrote: > > Hi, > > > > On Mon, Feb 4, 2019 at 4:31 AM Dave Martin <dave.mar...@arm.com> wrote: > > > > > > On Fri, Feb 01, 2019 at 01:38:05PM -0800, Doug Anderson wrote: > > > > Hi, > > > > > > > > I was wondering if anyone out there has given any thought to > > > > annotating the ARM64 IRQ handling in such a way that we could stack > > > > crawl past el1_irq() when in gdb. > > > > > > > > I spent a bit of time on this a few months ago and documented all my > > > > findings in: > > > > > > > > https://bugs.chromium.org/p/chromium/issues/detail?id=908721 > > > > > > > > I can copy and paste all the discussion from that bug here, but since > > > > it's public hopefully folks can read the discussion / investigation > > > > there. To put it briefly, though: I can stack crawl past "el1_irq" > > > > with the normal linux stack crawl (which is what kdb uses) but I can't > > > > crawl past "el1_irq" in gdb(). After talking to some of our tools > > > > guys here I'm fairly certain that we could solve this with the right > > > > CFI directives, but when I poked at it I wasn't able to figure out the > > > > magic. > > > > > > > > > > > > Anyway, I figured I'd check to see if anyone here happens to know the > > > > right magic. > > > > > > The kernel (appears to) generate a valid frame record for el1_irq: > > > > > > 0xffffff8008082b94 <+84>: mrs x22, elr_el1 > > > > > > [...] > > > > > > 0xffffff8008082ba0 <+96>: stp x29, x22, [sp, #304] > > > 0xffffff8008082ba4 <+100>: add x29, sp, #0x130 > > > > > > (I note that 0x130 == 304. Yay binutils.) > > > > Right, this is how the kernel is able to do the crawl. It's also why > > I was able to manually do the crawl in the bug by chaining together > > frame pointers. > > > > > > > From the bug report, I don't see any real investigation into what > > > precisely causes gdb to choke on this frame. > > > > Right. I just don't know gdb well enough. :( I've had it on my list > > to dig into it, but I need to find time. ;-) > > > > > > > Do you have evidence that CFI annotations help in this case? And can > > > you explain _why_ they help (i.e., precisely how is gdb relying on the > > > annotations)? > > > > I spent a tiny bit of time playing around with CFI annotations. > > Mostly it was stumbling around in the dark since I had a hard time > > finding good arm/arm64 examples and the documentation was a little > > hard for me to parse. > > You could try compiling a few simple C functions with gcc -S > -fexceptions and see what the compiler spits out.
Thanks, this definitely helped! > > ...but from my experience with gdb, my guess is that gdb wants more > > than just the simple frame pointers. It wants to know where _all_ the > > registers are stored on the stack and the only way it's going to get > > that from assembly code (especially assembly code that barfed the > > registers onto the stack somewhere that's not between FUNC and > > ENDFUNC) is with some type of annotation. My guess is that it doesn't > > fall back to just looking at frame pointer chains. Specifically as > > you move up the stack frame in gdb and you type "info reg", the set of > > registers changes to be those registers that are correct for the stack > > frame you're on. Here's a quick example showing how gdb behaves with > > a random register that was barfed, $x22: > > > > (gdb) frame 3 > > #3 0xffffff800846a088 in __handle_sysrq (key=103, > > check_mask=<optimized out>) at .../drivers/tty/sysrq.c:620 > > 620 op_p->handler(key); > > > > (gdb) disass > > Dump of assembler code for function __handle_sysrq: > > 0xffffff8008469f64 <+0>: str x23, [sp, #-64]! > > 0xffffff8008469f68 <+4>: stp x22, x21, [sp, #16] > > 0xffffff8008469f6c <+8>: stp x20, x19, [sp, #32] > > 0xffffff8008469f70 <+12>: stp x29, x30, [sp, #48] > > 0xffffff8008469f74 <+16>: add x29, sp, #0x30 > > > > (gdb) print /x $x22 > > $13 = 0xffffff8009035000 > > > > (gdb) print /x *(void**)($x29 - 0x30 + 16) > > $14 = 0x8000100 > > > > (gdb) up > > #4 0xffffff800846a0dc in handle_sysrq (key=103) at > > .../drivers/tty/sysrq.c:649 > > 649 __handle_sysrq(key, true); > > > > (gdb) print /x $x22 > > $15 = 0x8000100 > > > Indeed, but this requires full DWARF or .eh_frame info, which is not > generally available in the kernel. Yup, but I have it for gdb and right now the problem I'm trying to solve is being able to crawl in gdb since the kernel seems to be OK. I guess I was thinking that perhaps the DWARF info could be confusing gdb? > Except for code built with -fomit-frame-pointer, you should at least > be able to see a list of frames though: this doesn't require all the > registers of ancestor frames to be recovered, just x29 and lr (which is > what the frame records on the stack contain -- so no other magic info > is required in order to recover these). > > gdb tries various methods to unwind a frame, and ought to fall back to > this approach if all else fails. Frame chains that appear to loop > are a problem though, with no straightforward solution. > > My hunch is that gdb sees the frame chain attempt to loop backwards > after el1_irq and bails out. Is your task stack at a lower address than > the IRQ stack? Here's what I've got (not lower) #16 0xffffff8008082bf0 in el1_irq () at /mnt/host/source/src/third_party/kernel/v4.19/arch/arm64/kernel/entry.S:622 622 irq_handler (gdb) print /x $sp $11 = 0xffffff8008004000 (gdb) print /x $x29 $12 = 0xffffff8009003e90 (gdb) print /x ((void**)$x29)[0] $13 = 0xffffff8009003ed0 (gdb) print /x (*(void***)$x29)[0] $14 = 0xffffff8009003ee0 ...but then I poked a bit more and found out one really big problem is this that "irq_stack_entry" swaps the stack before calling gic_handle_irq() and this seemed to be confusing gdb. Specifically the value of "sp" when I point gdb at the "el1_irq" frame is actually "irq_stack_ptr" AKA 0xffffff8008004000. I've been fighting a bit with trying to figure out how to make .cfi directives do what I want and I managed a stupid/ugly hack that at least seems to get my stack pointer to be correct in el1_irq now: --- static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs) { u32 irqnr; + asm volatile (".cfi_register 31, 19"); --- ...when I do that then my stack pointer sane which I point at el1_irq (it matches x19), but I still can't get a trace. I also haven't yet been able to figure out how to accomplish that without hacking it into gic_handle_irq(). While it would be nice to get all this solved, it's probably not high priority right now, so I might have to punt unless there's some other obvious / low hanging fruit to try. -Doug _______________________________________________ Kgdb-bugreport mailing list Kgdb-bugreport@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport