Hi, Lassi,
    I am also working on a profiling tool using libunwind, I also met the
similar issue in signal-handler when I tried to extract the stack trace, my
stacktrace is:
Program received signal SIGSEGV, Segmentation fault.

[Switching to Thread 0x7fff18d8b910 (LWP 2022)]
access_mem (as=0x7ffff63961a0, addr=8, val=0x7fff18d89548, write=0,
arg=0x7fff18d89d90) at x86_64/Ginit.c:167
167           *val = *(unw_word_t *) addr;
(gdb) bt
#0  access_mem (as=0x7ffff63961a0, addr=8, val=0x7fff18d89548, write=0,
arg=0x7fff18d89d90) at x86_64/Ginit.c:167
#1  0x00007ffff618ae00 in dwarf_get (c=<value optimized out>,
rs=0x7fff18d89b00) at ../include/tdep-x86_64/libunwind_i.h:137
#2  apply_reg_state (c=<value optimized out>, rs=0x7fff18d89b00) at
dwarf/Gparser.c:747
#3  0x00007ffff618cada in _Ux86_64_dwarf_find_save_locs (c=0x7fff18d89d90)
at dwarf/Gparser.c:821
#4  0x00007ffff618d8b9 in _Ux86_64_dwarf_step (c=0x7ffff63961a0) at
dwarf/Gstep.c:34
#5  0x00007ffff618f6fa in _Ux86_64_step (cursor=0x7ffff63961a0) at
x86_64/Gstep.c:42
#6  0x00007ffff7bd80bb in get_backtrace (buffer=0x7fff18d8a6f0, n=10) at
racez_pmuprofiler.cc:342
#7  0x00007ffff7bd819b in signal_handler (sig=29, info=0x7fff18d8a8b0,
context=0x7fff18d8a780) at racez_pmuprofiler.cc:375
#8  <signal handler called>
#9  0x00007ffff69e91be in __tzfile_compute (timer=0, use_localtime=0,
leap_correct=<value optimized out>,
    leap_hit=0x7fff18d8ac0c, tp=0x7fff18d8ac50) at tzfile.c:794
#10 0x00000000004c6d69 in explode_time (xt=Cannot access memory at address
0xffffffffffffffa8
) at time/unix/time.c:92
Backtrace stopped: previous frame inner to this frame (corrupt stack?)


If I turn off the stacktrace, the program runs very well, So I believe it's
not the problem of my programs.
My setting is:
ubuntu 9.10, X8664, 2.6.30 kernel with perfmon patch.

I do not know if my problem is same as yours, but I really do not know how
to debug such problem. if you have some progress, I will try your patch ;-).

Tianwei
On Mon, Mar 8, 2010 at 11:24 PM, Lassi Tuura <[email protected]> wrote:

> Hi,
>
> I maintain a profiling utility which on x86-64 uses 6ms SIGPROF itimers and
> libunwind to sample the full stack trace for each tick. I would appreciate
> any help with the following issues, seen on RHEL5-derived systems.
>
> About 10% of the signals can capture stack trace only one or few levels
> below the signal frame.  Some of those involve junk return addresses (e.g.
> 0xa6, or an address on heap or on stack), and some have impossible call
> stacks (function x calling y in impossible ways). So far all "bad" stacks
> I've investigated involve a truncated stack trace, i.e. it didn't finish in
> _start. I have not yet come across these problems when signals are not
> involved.
>
> Does anyone have any words of wisdom why this occurs? I've been debugging
> this for some time, including decoding dwarf frame data and disassembly, but
> do not have a firm technical explanation yet. I have attached some example
> stacks, with annotations of which functions they are.
>
> My interpretation so far is that __return_rt unwind info is correct, but
> stack trace is unreliable/unstable across the signal frame. I would
> appreciate input from anyone who knows for sure what the problems really
> are.
>
> The problems I've seen so far:
>
> - ~2.3% represent various math functions (isnan, __ieee754_log10 etc.)
> immediately above signal frame.
>
> - ~2.0% ends in _init or __do_global_ctors_aux, i.e. calling shared library
> global constructors on dlopen().
>
> - Several end at f...@plt immediately above the signal frame, i.e. PLT
> entry interrupted by signal is non-traceable.
>
> - Several have bogus return address (e.g. < 0x4000, on heap or on stack)
> for the second frame above the signal frame. The few I investigated in
> detail also GDB reports incorrectly, but does manage to get the stack
> straightened plausibly in the end (see example below).
>
> - Suspiciously large fraction of failures occur at (function+0), i.e. at
> function entry address.
>
> These do not seem to explain everything, and I am not sure if they are
> supposed to be known flaws. Is it possible the unwind information is
> fundamentally inaccurate for async signals, and insufficient to arbitrarily
> trace the stack? Do some of the above ones represent fundamental known
> limitations?
>
> FWIW, we don't have these problems on ia32 on the same system. We don't use
> libunwind there, just simple EBP frame following.
>
> Examples below of various problems.
>
> Regards,
> Lassi
>
> (a)
>
> Stack trace using libunwind:
> UW#1  : 0x2b619a5ffc5f issig=0
> UW#2  : 0x2b619a603260 issig=0
> UW#3  : 0x3067c0e930 issig=1
> UW#4  : 0x2b619d08bbf3 issig=0
> UW#5  : 0x0000126c
> (SIGSEGV calling unw_is_signal_frame())
>
> GDB back trace for this crash:
>
> #4  <signal handler called>
> #5  access_mem (as=0x2b619d118bc0, addr=4716, val=0x7fff6e571990, write=0,
>    arg=0x7fff6e5719c0) at ../../../libunwind/src/x86_64/Ginit.c:167
> #6  0x00002b619d1113c4 in _ULx86_64_is_signal_frame (
>    cursor=<value optimized out>)
>    at ../../../libunwind/src/x86_64/Gis_signal_frame.c:53
> #7  0x00002b619a5ffd07 in IgHookTrace::stacktrace (
>    addresses=<value optimized out>, nmax=800)
>    at /afs/cern.ch/user/l/lat/dev/igprof/igprof/src/IgHookTrace.cc:250
> #8  0x00002b619a603260 in profileSignalHandler ()
>    at /afs/cern.ch/user/l/lat/dev/igprof/igprof/src/IgProfPerf.cc:46
> #9  <signal handler called>
> #10 0x00002b619d08bbf3 in std::string::assign (this=<value optimized out>,
>    __s...@0x17e0a3a0)
>    at
> /build/3jm/BUILD/slc5_amd64_gcc434/external/gcc/4.3.4/gcc-4.3.4/obj/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/basic_string.tcc:252
> #11 0x000000000000126c in ?? ()
> #12 0x0000000017e0a710 in ?? ()
> #13 0x0000000017e013b0 in ?? ()
> #14 0x0000000017df8050 in ?? ()
> #15 0x0000000017e013b0 in ?? ()
> #16 0x0000000017f768f0 in ?? ()
> #17 0xffffffffffff9000 in ?? ()
> #18 0xffffffffffffc800 in ?? ()
> #19 0x0000000017e013b0 in ?? ()
> #20 0x0000000017df8050 in ?? ()
> #21 0x0000000017e0a710 in ?? ()
> #22 0x000000000000126c in ?? ()
> #23 0x0000000017f6d590 in ?? ()
> #24 0x0000000017becd20 in ?? ()
> #25 0x00002b619abbb8d3 in
> std::__stable_sort_adaptive<__gnu_cxx::__normal_iterator<edmplugin::PluginInfo*,
> std::vector<edmplugin::PluginInfo, std::allocator<edmplugin::PluginInfo> >
> >, edmplugin::PluginInfo*, long, edmplugin::(anonymous
> namespace)::CompPluginInfos> ()
>   from /afs/
> cern.ch/cms/sw/slc5_amd64_gcc434/cms/cmssw/CMSSW_3_5_0/lib/slc5_amd64_gcc434/libFWCorePluginManager.so
> #26 0x00002b619abbba18 in edmplugin::CacheParser::read ()
>   from /afs/
> cern.ch/cms/sw/slc5_amd64_gcc434/cms/cmssw/CMSSW_3_5_0/lib/slc5_amd64_gcc434/libFWCorePluginManager.so
> #27 0x00002b619abc77c7 in edmplugin::PluginManager::PluginManager ()
>   from /afs/
> cern.ch/cms/sw/slc5_amd64_gcc434/cms/cmssw/CMSSW_3_5_0/lib/slc5_amd64_gcc434/libFWCorePluginManager.so
> #28 0x00002b619abc7f3e in edmplugin::PluginManager::configure ()
>   from /afs/
> cern.ch/cms/sw/slc5_amd64_gcc434/cms/cmssw/CMSSW_3_5_0/lib/slc5_amd64_gcc434/libFWCorePluginManager.so
> #29 0x000000000040b591 in main ()
>
> (b) Example truncated stack trace
>
> UWTRACE:
>  UW#1  : 0x2b99976fec5f issig=0 <_ZN11IgHookTrace10stacktraceEPPvi+53>
>  UW#2  : 0x2b9997702260 issig=0 <profileSignalHandler+103>
>  UW#3  : 0x3067c0e930 issig=1   <__restore_rt>
>  UW#4  : 0x2b9999f91e60 issig=0 <type_new+0> Objects/typeobject.c:2045
>  UWEND 0
>
> (c) Example stack traces ending up with UNW_REG_IP on stack or heap.
>
> UWTRACE:
>  UW#1  : 0x2b99976fec5f issig=0
>  UW#2  : 0x2b9997702260 issig=0
>  UW#3  : 0x3067c0e930 issig=1
>  UW#4  : 0x2b9999fda981 issig=0 <cleanreturn+49> Python/getargs.c:191
>  UW#5  : 0x7fff83a982c8 issig=0 [stack: 7fff83a71000-7fff83aae000]
>  UWEND 0
>
> UWTRACE:
>  UW#1  : 0x2b99976fec5f issig=0
>  UW#2  : 0x2b9997702260 issig=0
>  UW#3  : 0x3067c0e930 issig=1
>  UW#4  : 0x2b9999f671a2 issig=0 <lookdict+2> Objects/dictobject.c:342
>  UW#5  : 0x1a2c6230 issig=0     [heap 19557000-1ab6f000]
>  UWEND 0
>
> (d) Various other failed stack traces, with functions labelled. Surrounding
> functions of stacks of ~50 levels deep, and in any case end up in _start.
>
> UWTRACE:
>  UW#1  : 0x2b99976fec5f issig=0
>  UW#2  : 0x2b9997702260 issig=0
>  UW#3  : 0x3067c0e930 issig=1
>  UW#4  : 0x2b9999f1b520 issig=0 <pyframe_...@plt>
>  UWEND 0
>
> UWTRACE:
>  UW#1  : 0x2b99976fec5f issig=0
>  UW#2  : 0x2b9997702260 issig=0
>  UW#3  : 0x3067c0e930 issig=1
>  UW#4  : 0x2b9999f2afd2 issig=0 <PyObject_IsSubclass+2>
> Objects/abstract.c:2975
>  UW#5  : 0x1a234590 issig=0     [heap 19557000-1ab6f000]
>  UWEND 0
>
> UWTRACE:
>  UW#1  : 0x2b99976fec5f issig=0
>  UW#2  : 0x2b9997702260 issig=0
>  UW#3  : 0x3067c0e930 issig=1
>  UW#4  : 0x2b9999f81fd0 issig=0 <tupledealloc+0> Objects/tupleobject.c:162
>  UWEND 0
>
> UWTRACE:
>  UW#1  : 0x2b99976fec5f issig=0
>  UW#2  : 0x2b9997702260 issig=0
>  UW#3  : 0x3067c0e930 issig=1
>  UW#4  : 0x2b9999f6be3c issig=0 <PyObject_GenericGetAttr+140>
> Objects/object.c:1441
>  UW#5  : 0x7fff83a9ac70 issig=0 [stack: 7fff83a71000-7fff83aae000]
>  UWEND 0
>
> UWTRACE:
>  UW#1  : 0x2b99976fec5f issig=0
>  UW#2  : 0x2b9997702260 issig=0
>  UW#3  : 0x3067c0e930 issig=1
>  UW#4  : 0x2b9999f6e2b0 issig=0 <PyObject_Free+0> Objects/obmalloc.c:927
>  UWEND 0
>
> UWTRACE:
>  UW#1  : 0x2b99976fec5f issig=0
>  UW#2  : 0x2b9997702260 issig=0
>  UW#3  : 0x3067c0e930 issig=1
>  UW#4  : 0x2b9999f6a010 issig=0 <meth_dealloc+0> Objects/methodobject.c:132
>  UWEND 0
>
> UWTRACE:
>  UW#1  : 0x2b99976fec5f issig=0
>  UW#2  : 0x2b9997702260 issig=0
>  UW#3  : 0x3067c0e930 issig=1
>  UW#4  : 0x2b9999f1edf0 issig=0 <_pytype_loo...@plt>
>  UWEND 0
>
> UWTRACE:
>  UW#1  : 0x2b99976fec5f issig=0
>  UW#2  : 0x2b9997702260 issig=0
>  UW#3  : 0x3067c0e930 issig=1
>  UW#4  : 0x2b9999f6bdb0 issig=0 <PyObject_GenericGetAttr+0>
> Objects/object.c:1314
>  UWEND 0
>
> UWTRACE:
>  UW#1  : 0x2b99976fec5f issig=0
>  UW#2  : 0x2b9997702260 issig=0
>  UW#3  : 0x3067c0e930 issig=1
>  UW#4  : 0x2b9999f6c250 issig=0 <PyObject_GetAttr+0> Objects/object.c:1173
>  UWEND 0
>
> UWTRACE:
>  UW#1  : 0x2b99976fec5f issig=0
>  UW#2  : 0x2b9997702260 issig=0
>  UW#3  : 0x3067c0e930 issig=1
>  UW#4  : 0x2b9999f1d130 issig=0 <pyint_froml...@plt>
>  UWEND 0
>
> UWTRACE:
>  UW#1  : 0x2b99976fec5f issig=0
>  UW#2  : 0x2b9997702260 issig=0
>  UW#3  : 0x3067c0e930 issig=1
>  UW#4  : 0x2b9999f43750 issig=0 <getset_get+0> Objects/descrobject.c:140
>  UWEND 0
>
> UWTRACE:
>  UW#1  : 0x2b99976fec5f issig=0
>  UW#2  : 0x2b9997702260 issig=0
>  UW#3  : 0x3067c0e930 issig=1
>  UW#4  : 0x3067029ef0 issig=0 <__ctype_b_loc>
>  UWEND 0
>
> UWTRACE:
>  UW#1  : 0x2b99976fec5f issig=0
>  UW#2  : 0x2b9997702260 issig=0
>  UW#3  : 0x3067c0e930 issig=1
>  UW#4  : 0x2b9999f88280 issig=0 <_PyType_Lookup+288>
> Objects/typeobject.c:2514
>  UW#5  : 0x00000001
> Program received signal SIGSEGV, Segmentation fault.
> access_mem (as=0x2b999a217bc0, addr=1, val=0x7fff83a9ee40, write=0,
> arg=0x7fff83a9ee70) at ../../../libunwind/src/x86_64/Ginit.c:167
> 167           *val = *(unw_word_t *) addr;
>
> (e) An example stack where issig=-9999 means I omitted the check because of
> a suspicious address. It looks like apart from the bogus #5 entry the stack
> tracing is plausible and correct.
>
> UWTRACE:
>  UW#1  : 0x2b0062ec1c5f issig=0
>  UW#2  : 0x2b0062ec5270 issig=0
>  UW#3  : 0x3067c0e930 issig=1
>  UW#4  : 0x3067040672 issig=0
>  UW#5  : 0x0000002e issig=-9999
>  UW#6  : 0x3067044ad6 issig=0
>  UW#7  : 0x306706988a issig=0
>  UW#8  : 0x2b006592d95f issig=0
>  UW#9  : 0x2b0065931373 issig=0
>  UW#10 : 0x2b00659316ba issig=0
>  UW#11 : 0x2b0065944bce issig=0
>  UW#12 : 0x2b00632e7870 issig=0
>  UW#13 : 0x2b006328fe96 issig=0
>  UW#14 : 0x2b006313a907 issig=0
>  UW#15 : 0x2b0063140f04 issig=0
>  UW#16 : 0x2b0063548d1b issig=0
>  UW#17 : 0x2b0063549138 issig=0
>  UW#18 : 0x2b00635518d3 issig=0
>  UW#19 : 0x2b00635466a6 issig=0
>  UW#20 : 0x2b00656ea4a8 issig=0
>  UW#21 : 0x2b006578d4e2 issig=0
>  UW#22 : 0x2b006579032a issig=0
>  UW#23 : 0x2b006579032a issig=0
>  UW#24 : 0x2b006579032a issig=0
>  UW#25 : 0x2b006579032a issig=0
>  UW#26 : 0x2b006579032a issig=0
>  UW#27 : 0x2b0065790d69 issig=0
>  UW#28 : 0x2b0065790e82 issig=0
>  UW#29 : 0x2b00657aea2c issig=0
>  UW#30 : 0x2b00631326c9 issig=0
>  UW#31 : 0x2b00631328e3 issig=0
>  UW#32 : 0x2b0063132be7 issig=0
>  UW#33 : 0x2b0063127cfb issig=0
>  UW#34 : 0x0040ccfa issig=0
>  UW#35 : 0x306701d994 issig=0
>  UW#36 : 0x0040b2e9 issig=0
>  UWEND 0
>
>
>
> _______________________________________________
> Libunwind-devel mailing list
> [email protected]
> http://lists.nongnu.org/mailman/listinfo/libunwind-devel
>



-- 
Sheng, Tianwei
Inst. of High Performance Computing
Dept. of Computer Sci. & Tech.
Tsinghua Univ.
_______________________________________________
Libunwind-devel mailing list
[email protected]
http://lists.nongnu.org/mailman/listinfo/libunwind-devel

Reply via email to