Hi Don, > We find the same with regards to signals and callstack profiling > in the OpenSpeedShop tool. We typically patch src/x86_64/Gstep.c as the > systems we currently support typically crashed in access_mem. I understand > that the libunwind maintainers are concerned with performance issues > when validation is always on and maybe a configuration option to > force validation is needed to get such a patch applied (i.e. > src/x86_64/Ginit_local.c > setting c->validate = 1).
Right, I know, but for us x86-64 performance is so far from what we need that I am working on another patch to gain performance in a different way in any case. On the other hand without validation our user experience is dreadful - less than one run in 10 on a largish application escapes death in access_me. I did try more selective validation but it didn't work for us. I'll post the proto fast trace patch for discussion soon, see http://thread.gmane.org/gmane.comp.lib.unwind.devel/480 for the initial discussion on the subject. Your fix also turns validation always on x86-64, you just do it in the two code locations that follow unw_init_local/remote (unw_step() and unw_is_signal_frame())? Lassi > > We turn validation on at the "Try DWARF-base unwinding..." in GStep.c: > *** libunwind-20100123/src/x86_64/Gstep.c 2010-02-08 11:34:10.000000000 > -0500 > --- libunwind-0.99-X/src/x86_64/Gstep.c 2009-05-12 15:28:27.000000000 -0500 > *************** > *** 39,44 **** > --- 39,47 ---- > c, (unsigned long long) c->dwarf.ip); > > /* Try DWARF-based unwinding... */ > + /* need to validate here too. Intel compiler generated code > + * crashes with segv and sigbus on large mvapich jobs. */ > + c->validate = 1; > ret = dwarf_step (&c->dwarf); > > if (ret < 0 && ret != -UNW_ENOINFO) > *** libunwind-20100123/src/x86_64/Gis_signal_frame.c 2010-02-08 > 11:34:10.000000000 -0500 > --- libunwind-0.99-X/src/x86_64/Gis_signal_frame.c 2009-05-12 > 15:27:21.000000000 -0500 > *************** > *** 38,43 **** > --- 38,44 ---- > void *arg; > int ret; > > + c->validate = 1; > as = c->dwarf.as; > a = unw_get_accessors (as); > arg = c->dwarf.as_arg; > > This works for us on the nastiest cases we have seen (very large > simulation code at LLNL) and we do not see a noticeable performance hit > in the callstack profiler we use. That particular app would eventually > access bad memory attempting to unwind through Intel's fast memcpy routines. > We would also notice memory access crashes at high cpu counts when > profiling large mpi jobs. The above patch fixed the crashes (we still see > a very small number of truncated callstacks that are likely related to > other issues your patches appear to address). > We have also successfully profiled a large mpi benchmark (12000 cores) > on a cray-xt5 using libunwind with the above patch. > > Thanks for your work on this! > > regards, > Don > >> By far the biggest reason for this is inaccurate unwind information for >> function epilogues - the exit paths from the function don't have any unwind >> info, causing endless havoc if you happen to sample the stack there. There >> have been a number of recent updates to GCC on this, but I am not sure if >> they all made it even to 4.5.0 which was released just a few days ago. >> Anything before 4.5.0 is certainly prone to have significant issues of this >> sort. >> >> GDB will also fail to produce a useful stack trace in comparable >> circumstances. The fix needs to come from the compiler. >> >> Similar caveats of course apply to debug info produced by other means. One >> version of GLIBC I looked at has incorrect (manually entered) unwind info >> for at least one function. >> >> Regards, >> Lassi >> >> _______________________________________________ >> Libunwind-devel mailing list >> [email protected] >> http://lists.nongnu.org/mailman/listinfo/libunwind-devel > > > > _______________________________________________ > Libunwind-devel mailing list > [email protected] > http://lists.nongnu.org/mailman/listinfo/libunwind-devel > _______________________________________________ Libunwind-devel mailing list [email protected] http://lists.nongnu.org/mailman/listinfo/libunwind-devel
