On Sat, Dec 20, 2014 at 1:57 PM, Aliaksey Kandratsenka <
[email protected]> wrote:
>
>
>
> On Sat, Dec 20, 2014 at 8:24 AM, Aliaksey Kandratsenka <
> [email protected]> wrote:
>>
>>
>>
>> On Sat, Dec 20, 2014 at 6:18 AM, Tim Deegan <[email protected]> wrote:
>>>
>>> Hi all,
>>>
>>> While looking into a report about problems with libunwind unwinding
>>> from __lll_unlock_wake(), I noticed that I was getting some traces
>>> that didn't unwind at all, from one particular address in
>>> __lll_unlock_wake().
>>>
>>> On closer inspection, the one IP that fails is the first instruction
>>> in the function, and the failure is that c->use_pre_instr is set,
>>> causing the unwinder to look up the byte just before the function.
>>>
>>> I'm getting these traces from libprofiler/gperftools, which gets hold
>>> of the interrupted context in its signal handler (as a ucontext_t *)
>>> and passes it to unw_init_local() as its second argument.
>>> unw_init_local() unconditionally sets use_pre_instr, which is correct
>>> if the context came from unw_getcontext(), but not in this case.
>>>
>>> Disabling this path in libprofiler (and switching to a slower one that
>>> calls unw_getcontext() and then discards the first few frames) fixes
>>> the problem.
>>>
>>> Arguably this bug is in libprofiler, which ought not to be just
>>> passing a ucontext_t * to unw_init_local (though I see that the man
>>> page suggests this is OK for IA-64 at least).  But even so it would be
>>> good if libprofiler could use the signal context it already has,
>>> without setting use_prev_instr.
>>>
>>
>> Hi. Thanks for reporting it. I was the guy who implemented that change in
>> libprofiler to use signal's ucontext_t. We had some bug reports suggesting
>> that occasionally libunwind crashes when unwinding across signal handler
>> boundary. And my thought was that passing ucontext_t that is passed to
>> signal handler would work around that. I.e. by pointing libunwind exactly
>> at frame right before signal handler frame.
>>
>> Most likely original bug was only relevant for older systems and I've not
>> had any evidence that my change helps. So I'll simply revert that change in
>> profiler.
>>
>>
> But looks like that "use_prev_instr" thing is causing another interesting
> situation. E.g.: http://i.imgur.com/3rE7cpj.png (this is from pprof --web
> --addresses after I've effectively reverted "use signal's ucontext_t" patch)
>
> This is caused by profiler always putting instruction pointer at top of
> stack trace (if it's different from what stacktrace capturing function
> returns). And smaller by one addresses actually point in the middle of
> instructions e.g:
>
>   400631: 44 38 c2             cmp    %r8b,%dl
>   400634: 7d f2                 jge    400628 <q6+0x68>
>   400636: 48 89 c7             mov    %rax,%rdi
>   400639: e8 32 ff ff ff       callq  400570 <swap>
>   40063e: 4c 89 e6             mov    %r12,%rsi
>   400641: 48 89 df             mov    %rbx,%rdi
>   400644: e8 37 ff ff ff       callq  400580 <reverse>
>   400649: 5b                   pop    %rbx
>
> You can see for example that address 644 is "right" one and that is what
> top of stack contains (as noted above it's taken rip field from signal's
> ucontext) and first libunwind address is 643 which is off by one.
>
> Even more interesting is that _all addresses_ appear to be off by one when
> returned from libunwind. And I'm seeing same behavior from glibc's
> backtrace() call (which is in turn using libgcc's _Unwind_backtrace
> facility).
>
> I'll have to think a bit more how to deal with that "off by one" issue.
> Perhaps I'll simply extend existing "duplicate entry" logic to detect off
> by one case. Or try to detect that unwinder's behavior and adjust
> accordingly.
>
> So perhaps there's indeed something to fix at libunwind end. Unfortunately
> I'm not at all familiar with stack unwinding and why use_prev_instr might
> be good idea. Can anybody share what might be reason for such behavior ?
>
>
Thanks for raising this again. I've found a number of issues in profiler
code. "-1" behavior in non-top frames is caused by pprof. Given that it's
all call instructions it should be ok. Perhaps they use it to highlight
right instructions on assembly dump.

I'll be pushing a number of improvements as a result of that investigation
soon. One of them is revert of patch that was originally mentioned in this
thread.

Thanks a lot.
_______________________________________________
Libunwind-devel mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/libunwind-devel

Reply via email to