On Sat, Dec 20, 2014 at 8:24 AM, Aliaksey Kandratsenka < [email protected]> wrote: > > > > On Sat, Dec 20, 2014 at 6:18 AM, Tim Deegan <[email protected]> wrote: >> >> Hi all, >> >> While looking into a report about problems with libunwind unwinding >> from __lll_unlock_wake(), I noticed that I was getting some traces >> that didn't unwind at all, from one particular address in >> __lll_unlock_wake(). >> >> On closer inspection, the one IP that fails is the first instruction >> in the function, and the failure is that c->use_pre_instr is set, >> causing the unwinder to look up the byte just before the function. >> >> I'm getting these traces from libprofiler/gperftools, which gets hold >> of the interrupted context in its signal handler (as a ucontext_t *) >> and passes it to unw_init_local() as its second argument. >> unw_init_local() unconditionally sets use_pre_instr, which is correct >> if the context came from unw_getcontext(), but not in this case. >> >> Disabling this path in libprofiler (and switching to a slower one that >> calls unw_getcontext() and then discards the first few frames) fixes >> the problem. >> >> Arguably this bug is in libprofiler, which ought not to be just >> passing a ucontext_t * to unw_init_local (though I see that the man >> page suggests this is OK for IA-64 at least). But even so it would be >> good if libprofiler could use the signal context it already has, >> without setting use_prev_instr. >> > > Hi. Thanks for reporting it. I was the guy who implemented that change in > libprofiler to use signal's ucontext_t. We had some bug reports suggesting > that occasionally libunwind crashes when unwinding across signal handler > boundary. And my thought was that passing ucontext_t that is passed to > signal handler would work around that. I.e. by pointing libunwind exactly > at frame right before signal handler frame. > > Most likely original bug was only relevant for older systems and I've not > had any evidence that my change helps. So I'll simply revert that change in > profiler. > > But looks like that "use_prev_instr" thing is causing another interesting situation. E.g.: http://i.imgur.com/3rE7cpj.png (this is from pprof --web --addresses after I've effectively reverted "use signal's ucontext_t" patch)
This is caused by profiler always putting instruction pointer at top of stack trace (if it's different from what stacktrace capturing function returns). And smaller by one addresses actually point in the middle of instructions e.g: 400631: 44 38 c2 cmp %r8b,%dl 400634: 7d f2 jge 400628 <q6+0x68> 400636: 48 89 c7 mov %rax,%rdi 400639: e8 32 ff ff ff callq 400570 <swap> 40063e: 4c 89 e6 mov %r12,%rsi 400641: 48 89 df mov %rbx,%rdi 400644: e8 37 ff ff ff callq 400580 <reverse> 400649: 5b pop %rbx You can see for example that address 644 is "right" one and that is what top of stack contains (as noted above it's taken rip field from signal's ucontext) and first libunwind address is 643 which is off by one. Even more interesting is that _all addresses_ appear to be off by one when returned from libunwind. And I'm seeing same behavior from glibc's backtrace() call (which is in turn using libgcc's _Unwind_backtrace facility). I'll have to think a bit more how to deal with that "off by one" issue. Perhaps I'll simply extend existing "duplicate entry" logic to detect off by one case. Or try to detect that unwinder's behavior and adjust accordingly. So perhaps there's indeed something to fix at libunwind end. Unfortunately I'm not at all familiar with stack unwinding and why use_prev_instr might be good idea. Can anybody share what might be reason for such behavior ?
_______________________________________________ Libunwind-devel mailing list [email protected] https://lists.nongnu.org/mailman/listinfo/libunwind-devel
