This patch looks good to me (sorry for the delay in replying -- MPI Forum + OMPI dev meeting got in the way).
Brian -- do you have any opinions on it? On Dec 11, 2013, at 1:43 AM, Kawashima, Takahiro <t-kawash...@jp.fujitsu.com> wrote: > Hi, > > Open MPI's signal handler (show_stackframe function defined in > opal/util/stacktrace.c) calls non-async-signal-safe functions > and it causes a problem. > > See attached mpisigabrt.c. Passing corrupted memory to realloc(3) > will cause SIGABRT and show_stackframe function will be invoked. > But invoked show_stackframe function deadlocks in backtrace_symbols(3) > on some systems because backtrace_symbols(3) calls malloc(3) > internally and a deadlock of realloc/malloc mutex occurs. > > Attached mpisigabrt.gstack.txt shows the stacktrace gotten > by gdb in this deadlock situation on Ubuntu 12.04 LTS (precise) > x86_64. Though I could not reproduce this behavior on RHEL 5/6, > I can reproduce it also on K computer and its successor PRIMEHPC FX10. > Passing non-heap memory to free(3) and double-free also cause > this deadlock. > > malloc (and backtrace_symbols) is not marked as async-signal-safe > in POSIX and current glibc, though it seems to have been marked > in old glibc. So we should not call it in the signal handler now. > > > http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04 > http://cygwin.com/ml/libc-help/2013-06/msg00005.html > > I wrote a patch to address this issue. See the attached > async-signal-safe-stacktrace.patch. > > This patch calls backtrace_symbols_fd(3) instead of backtrace_symbols(3). > Though backtrace_symbols_fd is not declared as async-signal-safe, > it is described not to call malloc internally in its man. So it > should be rather safer. > > Output format of show_stackframe function is not changed by > this patch. But the opal_backtrace_print function (backtrace > framework) interface is changed for the output format compatibility. > This requires changes in some additional files (ompi_mpi_abort.c > etc.). > > This patch also removes unnecessary fflush(3) calls, which are > meaningless for write(2) system call but might cause a similar > problem. > > What do you think about this patch? > > Takahiro Kawashima, > MPI development team, > Fujitsu > <async-signal-safe-stacktrace.patch><mpisigabrt.c><mpisigabrt.gstack.txt>_______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/