On Sat, Dec 20, 2014 at 8:24 AM, Aliaksey Kandratsenka <
[email protected]> wrote:
>
>
>
> On Sat, Dec 20, 2014 at 6:18 AM, Tim Deegan <[email protected]> wrote:
>>
>> Hi all,
>>
>> While looking into a report about problems with libunwind unwinding
>> from __lll_unlock_wake(), I noticed that I was getting some traces
>> that didn't unwind at all, from one particular address in
>> __lll_unlock_wake().
>>
>> On closer inspection, the one IP that fails is the first instruction
>> in the function, and the failure is that c->use_pre_instr is set,
>> causing the unwinder to look up the byte just before the function.
>>
>> I'm getting these traces from libprofiler/gperftools, which gets hold
>> of the interrupted context in its signal handler (as a ucontext_t *)
>> and passes it to unw_init_local() as its second argument.
>> unw_init_local() unconditionally sets use_pre_instr, which is correct
>> if the context came from unw_getcontext(), but not in this case.
>>
>> Disabling this path in libprofiler (and switching to a slower one that
>> calls unw_getcontext() and then discards the first few frames) fixes
>> the problem.
>>
>> Arguably this bug is in libprofiler, which ought not to be just
>> passing a ucontext_t * to unw_init_local (though I see that the man
>> page suggests this is OK for IA-64 at least).  But even so it would be
>> good if libprofiler could use the signal context it already has,
>> without setting use_prev_instr.
>>
>
> Hi. Thanks for reporting it. I was the guy who implemented that change in
> libprofiler to use signal's ucontext_t. We had some bug reports suggesting
> that occasionally libunwind crashes when unwinding across signal handler
> boundary. And my thought was that passing ucontext_t that is passed to
> signal handler would work around that. I.e. by pointing libunwind exactly
> at frame right before signal handler frame.
>
> Most likely original bug was only relevant for older systems and I've not
> had any evidence that my change helps. So I'll simply revert that change in
> profiler.
>
>
But looks like that "use_prev_instr" thing is causing another interesting
situation. E.g.: http://i.imgur.com/3rE7cpj.png (this is from pprof --web
--addresses after I've effectively reverted "use signal's ucontext_t" patch)

This is caused by profiler always putting instruction pointer at top of
stack trace (if it's different from what stacktrace capturing function
returns). And smaller by one addresses actually point in the middle of
instructions e.g:

  400631: 44 38 c2             cmp    %r8b,%dl
  400634: 7d f2                 jge    400628 <q6+0x68>
  400636: 48 89 c7             mov    %rax,%rdi
  400639: e8 32 ff ff ff       callq  400570 <swap>
  40063e: 4c 89 e6             mov    %r12,%rsi
  400641: 48 89 df             mov    %rbx,%rdi
  400644: e8 37 ff ff ff       callq  400580 <reverse>
  400649: 5b                   pop    %rbx

You can see for example that address 644 is "right" one and that is what
top of stack contains (as noted above it's taken rip field from signal's
ucontext) and first libunwind address is 643 which is off by one.

Even more interesting is that _all addresses_ appear to be off by one when
returned from libunwind. And I'm seeing same behavior from glibc's
backtrace() call (which is in turn using libgcc's _Unwind_backtrace
facility).

I'll have to think a bit more how to deal with that "off by one" issue.
Perhaps I'll simply extend existing "duplicate entry" logic to detect off
by one case. Or try to detect that unwinder's behavior and adjust
accordingly.

So perhaps there's indeed something to fix at libunwind end. Unfortunately
I'm not at all familiar with stack unwinding and why use_prev_instr might
be good idea. Can anybody share what might be reason for such behavior ?
_______________________________________________
Libunwind-devel mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/libunwind-devel

Reply via email to